SRE/DevOps Engineer improving platform reliability for multi-award-winning digital payments platform. Working from UK offices and collaborating with engineers to build a developer-friendly platform.
Responsibilities
Design, build and maintain secure, scalable cloud infrastructure across AWS and Azure
Manage and enhance our Kubernetes (EKS) platform to support reliable, modern applications
Develop and maintain Infrastructure as Code using Terraform and Helm
Improve and support CI/CD pipelines using Argo Workflows, ArgoCD and GitHub Actions
Lead and participate in incident response, including on‑call activities and major incident coordination
Drive high‑quality monitoring, alerting and observability across metrics, logs and traces
Conduct and support blameless post‑incident reviews, ensuring follow‑up actions are delivered
Define and implement SLIs/SLOs to improve service reliability and operational excellence
Collaborate with engineering teams to embed best practices and improve developer experience
Contribute to automation, tooling, and continuous improvements that reduce toil and increase platform resilience
Requirements
Proven experience in DevOps, SRE, or Platform Engineering roles
Strong hands‑on experience running Kubernetes in production
Experience with AWS and/or Azure cloud platforms
Solid experience with Terraform and IaC automation
Experience participating in or managing production incidents and on‑call
Strong grasp of monitoring, alerting, and observability principles
Ability to diagnose and fix complex distributed systems issues
Demonstrated use of GenAI tools (ChatGPT, GitHub Copilot, Claude) in engineering workflows
Excellent communication and calmness under pressure
A passion for automation and reducing toil
Benefits
Competitive Salary
Company bonus scheme
Private Healthcare and Medicash plan
26 days holiday + bank holidays plus volunteer days
Tax-saving Salary Sacrifice Pension with Aviva
Salary sacrifice Cycle to Work, Octopus Electric Vehicle, and Nursery fee schemes
Access to a benefits platform with £250 per year for wellbeing and £150 per year for development
Bumper Flex policy for better work/life balance
Annual company-wide Bumper Retreat
4 months paid leave to primary carers and 1 month to secondary carers
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.
Senior Platform DevOps Engineer at Code Metal designing and implementing cloud and hybrid infrastructure to support customer deployments and internal platforms. Collaborating with software and security teams for reliable delivery.
DevOps Platform Intern managing cloud infrastructure and deployment pipelines for AI - native software delivery. Partnering with a Product Development Intern, set up and manage containerized applications on Azure Kubernetes Service.
UNIX DevOps Engineer managing AIX and Solaris server operations for a Swiss telecom company. Focusing on automation, optimization and 7x24h monitoring responsibilities across multiple locations.