SRE role at BT Group focusing on cloud reliability and operational excellence across engineering teams. Collaborate with product owners to implement SRE principles for improved service performance.
Responsibilities
Partner with Product Owners and engineering leads to embed reliability into roadmaps, backlogs, and delivery decisions.
Apply SRE principles (SLIs, SLOs, error budgets) to maintain service reliability, performance, and scalability.
Enhance observability across metrics, logs, traces, and events to ensure services are observable by design.
Manage infrastructure as code and CI/CD environments, delivering improvements and supporting operational changes.
Lead incident response and root cause analysis, driving effective resolution, post incident reviews, and long term prevention.
Work with cross functional engineering teams to remove technical barriers, reduce toil, and improve service operability.
Provide hands on engineering support, validating technical decisions and promoting best practices.
Foster a culture of curiosity, experimentation, and first principles thinking to strengthen engineering excellence.
Requirements
Deep understanding of SRE concepts SLIs, SLOs, SLAs and error budgets
Proven ability to design and implement reliable environments
Hands-on experience with monitoring tools, application insights, integrations with tools such as Prometheus and Grafana
Infrastructure as Code skills e.g. Terraform
Advanced knowledge of vmware technology
Experience with CI/CD, automation and monitoring tools
Experience with disaster recovery planning and chaos engineering practices
Experience implementing identity governance and security frameworks
Benefits
Flexibility in working hours
Reasonable adjustments for the selection process if required
Database Reliability Engineer managing PostgreSQL infrastructure that underpins transactions at Nodal Exchange. Ensuring data integrity and performance in a regulated financial environment.
Senior Information Security Analyst responsible for integrating security practices in development. Join Panvel’s team focusing on securing applications and infrastructure.
DevOps Engineer leading the automation and adoption of DevOps best practices. Collaborating with teams to enhance agile delivery in cloud environments.
Senior Backend Engineer designing and developing backend services in Rust for Mobile DevOps. Collaborating on the Employee Superapp and implementing digital wallet services.
AI Development Operations Engineer responsible for the internal AI infrastructure empowering developers. Integrating AI systems into engineering workflows for efficient software design and maintenance.
Reliability Engineer responsible for availability and performance of U.S. Air Force Cloud services. Collaborates with teams to deliver reliable mission - critical systems in a hybrid environment.
Entry - level DevOps Engineer assisting in cloud infrastructure automation for AI - powered security operations platform. Seeking passionate candidates with foundational knowledge in Terraform, Kubernetes, and CI/CD pipelines.
DevSecOps Engineer responsible for security in CI/CD pipelines for a global client network. Collaborating on security hardening of applications and automation processes.
DevSecOps Engineer maintaining CI/CD security pipelines at SQA Consulting. Collaborating with teams to automate processes and ensure security best practices are followed.
DevSecOps Engineer for SQA Consulting focusing on CI/CD automation and security hardening. Collaborating with teams on cloud solutions in a hybrid work environment.