Senior Site Reliability Engineer at qode.world | Hybrid Hired

About the role

Senior Site Reliability Engineer driving observability and reliability for business-critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.

Responsibilities

Design, implement, and maintain observability solutions across distributed systems
Build and optimize logging, metrics, and tracing pipelines using tools like Dynatrace, Datadog, Splunk, ELK, Grafana, and OpenTelemetry
Enable end-to-end transaction tracing across microservices and APIs
Develop dashboards and alerting strategies for proactive issue detection
Own service reliability, uptime, and operational performance for critical systems
Lead incident response, root cause analysis (RCA), and postmortems
Reduce **MTTD and MTTR** through automation and improved observability
Create and maintain runbooks and incident response playbooks
Monitor and optimize system performance (latency, throughput, error rates)
Partner with application and database teams to troubleshoot bottlenecks
Use distributed tracing and telemetry data to identify and resolve issues
Implement performance testing and tuning strategies
Build and maintain fault-tolerant, highly available systems
Implement resiliency patterns (failover, retries, circuit breakers, self-healing)
Drive chaos engineering practices to validate system reliability
Automate operational tasks using scripting (Python, Go, etc.)
Define and enforce SLOs, SLIs, and error budgets aligned to business goals
Promote SRE principles across engineering teams
Partner with DevOps and platform teams to improve CI/CD reliability
Contribute to building a culture of operational excellence and accountability

Requirements

7–10+ years of experience in **Site Reliability Engineering or Production Support Engineering**
Strong hands-on experience with observability tools (Dynatrace, Datadog, Splunk, ELK, Grafana, OpenTelemetry, Jaeger)
Experience supporting **cloud-native environments (AWS, Azure, or GCP)**
Deep understanding of **microservices architecture and distributed systems**
Proficiency in scripting/programming (Python, Go, Java, or similar)
Experience with monitoring, alerting, and incident management in production environments

Similar roles

Browse all Devops Engineer jobs

1 hour ago

LE

DevOps Manager – USAF Cloud One

Leidos

DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.

Hybrid Role

United States Devops Engineer

$131,300 - $237,350 per year

1 hour ago

LG

Lead Cloud Site Reliability Engineer

Lloyds Banking Group

Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.

Hybrid Role

Halifax United Kingdom Devops Engineer

£92,701 - £109,060 per year

6 hours ago

BU

DevOps Engineer – Microsoft Intune

Bundesdruckerei-Gruppe

DevOps Engineer responsible for managing Microsoft Intune operations at Bundesdruckerei GmbH. Focused on ensuring secure digital solutions for identity and data protection in Berlin.

Onsite Role

Berlin Germany Devops Engineer

11 hours ago

VA

DevSecOps Specialist

Vanguard

DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.

Hybrid Role

Dallas United States Devops Engineer

14 hours ago

SC

Site Reliability Engineer – Compute

Scaleway

Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.

Hybrid Role

Paris France Devops Engineer

16 hours ago

BR

DevOps Team Lead

Bromcom

DevOps Team Lead with hands - on Azure experience at Bromcom. Leading technical delivery and team coordination for Azure infrastructure management.

Hybrid Role

Bromley United Kingdom Devops Engineer

17 hours ago

WO

Reliability Engineer

Wood

Reliability Engineer responsible for equipment reliability and safety using data - driven analysis for Wood in Aberdeen. Focus on proactive maintenance and operational efficiency.

Hybrid Role

Aberdeen United Kingdom Devops Engineer

20 hours ago

UC

Principal Safety & Reliability Engineer

Ultra Intelligence & Communications

Principal Safety and Reliability Engineer developing and supporting safety design for mission - critical aerospace systems. Engaging in design reviews and ensuring compliance with requirements.

Hybrid Role

Cambridge United Kingdom Devops Engineer

23 hours ago

BT

Cloud DevOps Engineer

BTS

Cloud DevOps Engineer playing a pivotal role in developing migration plans for Coast Guard Cloud Architecture. Collaborating with teams to ensure effectiveness and best practices in cloud implementation.

Hybrid Role

San Diego United States Devops Engineer

$200,000 - $225,000 per year

23 hours ago

DA

Reliability Engineer III

Daimler Truck North America

Reliability Engineer III at Daimler Truck developing propulsion technology solutions for electrified and conventional axle components. Leading testing and validation for complex powertrain systems.

Hybrid Role

Detroit United States Devops Engineer

$109,140 per year