Senior Site Reliability Engineer focused on developing and maintaining OpenShift-based platform solutions at Red Hat. Responsible for software automation, onboarding new services, and maintaining service reliability.
Responsibilities
Design, write, and maintain software (primarily in Python and Golang) that automates the deployment, monitoring, and maintenance of Red Hat managed services.
Onboarding of new services onto our OpenShift-based platform: adhering to cloud-native design principles & best practices to ensure reliability, scalability, and security; contribute to documents, like standard operating procedures (SOPs) and playbooks, that assist in issue resolution and new-service onboarding.
Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
Participate in an Agile Scrum team that scopes, prioritizes, and allocates work items.
Participate in an on-call rotation that is responsible for responding to service incidents.
Requirements
5+ years of relevant work experience
Background writing object-oriented automation software in Python, experience with Golang is only plus
Background administering production cloud-native services, preferably containerized and deployed via a container-orchestration system like Kubernetes or OpenShift
Experience diagnosing service failures and carrying out incident response procedures
Familiarity with Linux operating system and its configuration
Ability to effectively work in a globally distributed team
Understanding of computer networking and protocols, including TCP/IP and DNS
Understanding of computer security and cryptography basics, including certificates, TLS, and credential-storage systems like Vault is a plus
Familiarity with CI/CD pipeline concepts and systems, like Jenkins and Tekton/Argo is a plus
Familiarity with observability tools like Prometheus and Grafana, and how to define metrics that can be used to measure service health and reliability is a plus
Sr. Systems Engineer implementing and optimizing CI/CD platforms at Arch Capital Group. Collaborating with teams and driving DevOps strategy with expertise in cloud technologies.
Java Full Stack and AWS DevOps Developer for Boeing's Manufacturing Quality Information Technology Team, maintaining and enhancing software systems and DevOps environments while ensuring compliance.
Senior DevOps Engineer at One Pass redefining health engagement, managing scalable cloud infrastructure and enhancing automation. Collaborate across teams to ensure system reliability and performance.
DevOps Engineer at One Pass building and improving cloud infrastructure in AWS. Collaborating with engineers on deployments, reliability, and automation in a fast - paced environment.
Site Reliability Engineer maintaining cloud infrastructure reliability for Tecsys solutions. Collaborating across teams to support services and implement automation, observability, and frameworks.
Senior Release Engineer designing CI/CD pipelines for Kaseware’s mission - critical software. Collaborating with engineering, security, and operations teams to ensure fast and reliable deployments.
DevOps Engineer managing Kubernetes and cloud infrastructure for innovative legal software startup. Collaborating with development teams and ensuring smooth deployment processes.
DevOps Architect defining and evolving AgencyBloc’s cloud and DevOps strategy. Leading design of infrastructure and CI/CD frameworks for secure and scalable SaaS platforms.
DevOps Engineer at VERBI Software GmbH managing AWS - centric infrastructure and driving reliability, scalability, and modernization. Hands - on role applying SRE principles to evolve towards cloud - native best practices.
Sr. DevSecOps Engineer I at MetroStar ensuring integration of security best practices in development and operations lifecycle. Collaborating in delivering high - quality solutions for federal government applications.