Site Reliability Engineer maintaining systems and infrastructure to ensure reliability and performance. Collaborating with developers and automating operational tasks for a robust cloud environment.
Responsibilities
Design and maintain reliable systems and infrastructure
Monitor system reliability and performance
Collaborate with development teams to ensure system robustness
Automate operational tasks and processes
Troubleshoot and resolve issues in production environments
Implement best practices for system availability, security, and performance
Mentor junior SRE team members
Requirements
5+ years of experience in site reliability engineering
Strong background in Linux/Unix systems
Proficient in scripting languages (Python, Bash, etc.)
Experience with cloud providers (AWS, Azure, Google Cloud)
Knowledge of CI/CD tools and processes
Understanding of application architecture and microservices
Excellent troubleshooting skills
Good communication skills and ability to work in a team environment.
Background in networking stack and protocols
Should be available for on-call rotations as needed
Benefits
Medical provided through Cigna (PPO, HSA, EPO options)
Medical provided through Kaiser (HMO option only) for California employees only
Dental provided through Cigna (DPPO & DHMO options)
Nationwide Vision provided through VSP
Flexible Spending Account for Health & Dependent Care
Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific)
Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera
Corporate Wellness Program
Employee Assistance Program
Wellness Days
401k Plan
Basic Life, Accidental Life, Supplemental Life Insurance
Short Term & Long Term Disability
Critical Illness, Critical Hospital, and Voluntary Accident Insurance
Tuition Reimbursement (available 6 months after start date, capped)
Paid Time Off (accrued and prorated, maximum of 120 hours annually)
Paid Holidays
Any other statutory leaves, paid time, or other fringe benefits required under state and federal law
Principal AI Site Reliability Engineer driving operational excellence for critical contact center applications at Fidelity. Leading automation and observability initiatives to improve reliability and efficiency.
Data Transport Infrastructure DevOps Engineer at Leidos modernizing global - scale multi - cloud environments for USAF missions. Involves developing cloud - native solutions and ensuring security best practices.
DevOps Engineer responsible for building and optimizing AWS - based infrastructure and backend systems at Allguth GmbH. Part of a team focused on innovative mobility solutions in Munich region.
(Senior) DevOps Engineer specializing in ML solutions implementation and management in Germany. Focused on CI/CD pipelines, automation, and cloud services.
Specialist DevSecOps joining Periferia IT Group, a leader in digital transformation. Work in a dynamic environment with continuous learning and professional development opportunities.
Asset Reliability Engineer providing maintenance advice and service innovations. Join Sensorfact, the leading smart monitoring platform, to modernize the industrial sector.
Join Zinkworks as a Senior Platform Engineer designing scalable IaC - driven cloud platforms for a large - scale enterprise contact centre. Focused on automation, reliability, and platform ownership in a hybrid work environment.
Cloud Operations Engineer responsible for securing AWS infrastructure at Avalon Healthcare Solutions. Collaborating on SRE best practices and ensuring system reliability and performance.
Design Release Engineer designing, developing, and releasing seat systems for Ford vehicles. Ensuring engineering deliverables meet quality, cost, and timing targets while collaborating with cross - functional teams.