Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Responsibilities
Understand and participate in the changing FedRAMP space by quickly ramping up with the 20x controls and building upon these to maintain federal compliance
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies
Deploy production changes to Confluent Cloud systems and infrastructure through established change management processes
Assist with process improvements and adoption of change management
Own monitoring and incident handling of complex distributed systems, engaging engineering teams when needed through an escort model system.
Act as a core member of Confluents Business Continuity Plan and Disaster Recovery team with efforts across 3 large verticals
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.
Participate in a 24/7 on-call rotation to maintain the integrity of Confluent Cloud for Government systems
Requirements
0-2 years of relevant experience
Experience in Cloud Native technologies with experience operating production services in the cloud
Fundamentals of Distributed Systems and their design
Knowledge of Kubernetes and containerization
Proficiency in infrastructure as code (Terraform preferred)
Experience with telemetry tooling to monitor production systems (DataDog, Grafana, Prometheus)
Exposure and understanding of BCP/DR and high availability exercises
Ability to quickly problem-solve and troubleshoot critical services
Proficiency with scripting and automation (e.g Go, Java, Python, Bash)
Exceptional teamwork, collaboration skills, and the ability to act critically with minimal supervision at times in a remote first environment
Experience with a rotating on-call schedule to provide 24/7 support
BS Degree in Computer Science, Engineering, or equivalent experience
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.
Senior Platform DevOps Engineer at Code Metal designing and implementing cloud and hybrid infrastructure to support customer deployments and internal platforms. Collaborating with software and security teams for reliable delivery.
DevOps Platform Intern managing cloud infrastructure and deployment pipelines for AI - native software delivery. Partnering with a Product Development Intern, set up and manage containerized applications on Azure Kubernetes Service.