Senior Specialist – Cloud SRE at Datavail | Hybrid Hired

About the role

Senior Site Reliability Engineer improving reliability and performance of business-critical services in multi-cloud AWS, Azure, and GCP environments. Collaborate with engineering teams to drive automation and measurable outcomes.

Responsibilities

Reliability Engineering & SRE Practices
Define, implement, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for critical services.
Continuously monitor SLO compliance and drive improvements based on error budget consumption.
Participate in architecture reviews focused on high availability, disaster recovery, scalability, and fault tolerance.
Lead incident response, acting as the Tier-3 escalation point for SRE and operations teams.
Drive blameless postmortems, Root Cause Analysis (RCA), and ensure corrective and preventive actions are implemented.
Define and maintain incident response runbooks, escalation paths, and on-call processes.
Track and improve key reliability metrics including MTTR, incident frequency, and change failure rate.
Automate infrastructure provisioning and operational workflows using Terraform, CloudFormation, and AWS CDK.
Build and maintain CI/CD pipelines supporting canary deployments, blue/green strategies, and automated rollbacks.
Implement event-driven automation and auto-remediation using AWS Lambda, Step Functions, or Azure Functions.
Continually identify and eliminate operational toil through automation and self-healing systems.
Design, implement, and operate end-to-end observability platforms covering metrics, logs, and traces.
Ensure alerts are SLO-driven, actionable, and noise-free.
Provision and manage cloud infrastructure across AWS, Azure, and/or GCP.
Operate compute, storage, networking, load balancers, VPNs, and private connectivity.
Manage patching, backups, encryption, IAM/RBAC, and disaster recovery readiness.
Optimize performance and cost through rightsizing, autoscaling, and capacity planning.

Requirements

8–10 years of experience in SRE, Cloud Engineering, or Production Operations roles.
Strong OS fundamentals: Linux and Windows, with scripting (Bash, PowerShell).
Strong programming skills in Python, Go, or equivalent.
Proven hands-on experience with:
Infrastructure as Code (Terraform, CloudFormation, CDK)
CI/CD pipelines and deployment automation
Observability tools (New Relic, Datadog, Prometheus, Grafana, Graylog, ELK)
Distributed systems at production scale
Cloud certifications (one or more):
AWS (Associate or Professional)
Azure (AZ-104 / Architect Expert)
GCP (Professional Cloud Architect)
Cloud-agnostic certification such as Terraform Associate, CKA, or SRE Foundation.

Similar roles

Browse all Devops Engineer jobs

1 hour ago

FI

Senior Site Reliability Engineer

Fixify

Senior Site Reliability Engineer responsible for designing reliable infrastructure supporting Fixify's SaaS platform. Collaborating with product engineering teams and maintaining operational standards for infrastructure performance.

Hybrid Role

Ireland Devops Engineer

1 hour ago

IN

DevOps Engineer

Internetstiftelsen

DevOps Engineer working with critical infrastructure systems for Swedish internet services. Focused on building and managing robust systems and contributing to automation and operational improvements.

Hybrid Role

Stockholm Sweden Devops Engineer

2 hours ago

OC

DevSecOps Consultant

Orange Cyberdefense

DevSecOps Consultant integrating security into IT development and operational processes. Advising clients on seamless integration of security requirements into DevOps workflows.

Hybrid Role

Germany Devops Engineer

3 hours ago

SW

DevOps Engineer

Swift

DevOps Engineer designing, developing and supporting programs at Swift, the leading provider of secure financial messaging services. Involves system analysis, program development and team collaboration.

Onsite Role

Tysons United States Devops Engineer

6 hours ago

CI

Senior Infrastructure/DevSecOps Engineer

CACI International Inc

Senior DevSecOps Engineer delivering complex software applications with a talented team in the defense sector. The role requires strong Kubernetes and cloud platform knowledge.

Hybrid Role

Aurora United States Devops Engineer

$82,100 - $172,400 per year

6 hours ago

CI

Senior Infrastructure/DevSecOps Engineer

CACI International Inc

Senior Infrastructure/DevSecOps Engineer delivering complex software applications. Collaborating with a talented team to enhance national security efforts at CACI.

Hybrid Role

Chantilly United States Devops Engineer

$82,100 - $172,400 per year

6 hours ago

CI

Staff Infrastructure, DevSecOps Engineer

CACI International Inc

Staff Infrastructure/DevSecOps Engineer delivering complex software applications in collaboration with a talented team. Drive innovation and support national missions at CACI with a commitment to integrity.

Hybrid Role

Chantilly United States Devops Engineer

$98,500 - $206,800 per year

7 hours ago

EE

DevOps Engineer

EEOC

Platform DevOps Engineer at Booz Allen Hamilton developing and managing container platforms for cloud capabilities. Collaborating to improve client environments using the latest cloud technologies.

Hybrid Role

San Diego United States Devops Engineer

$61,900 - $141,000 per year

8 hours ago

CI

Platform Automation – DevOps

Ciena

DevOps Engineer enhancing reliability and performance of Ciena's Blue Planet applications in cloud environments. Implementing automation and upgrade strategies for seamless delivery of services.

Onsite Role

Gurugram India Devops Engineer

9 hours ago

EG

Senior Site Reliability Engineer – Backup Services

Expleo Group

Site Reliability Engineer working on cloudification of backup services at Expleo. Contributing to infrastructure evolution with a team of skilled engineers.

Hybrid Role

Bucharest Romania Devops Engineer