Senior Data Reliability Engineer ensuring software reliability and quality across enterprise applications. Collaborating with teams to implement robust on-call processes and maintain data fidelity.
Responsibilities
Evangelise SRE & DRE across engineering
Lead the charge on building out a framework for data quality that will provide our customers with strong guarantees about the fidelity of our data as well support our marketing and revenue functions
SRE as a function define and own the on-call process:
Quickly establishing a strong working knowledge of our systems
Commanding incidents
Running mop-ups
Ensuring follow-up actions are completed to your schedule
Evaluating and improving our existing E2E on-call process
Take part in the on-call rotation, one week every 4–5 weeks (24x7x365 coverage)
Evaluate, manage and maintain our existing solutions for monitoring, alerting, paging, response, documentation
Report on uptime, availability, performance, etc across our product suite
Write post-mortems for both internal and external consumption
Represent our SRE & DRE function on sales calls with tier one enterprise financial institutions
Work with product, sales and customer service to define SLAs for different products and use cases
Work with internal product teams to define SLOs for internal consumption and measurement
Work with our engineering teams directly to embed DRE practices
Requirements
Proven experience at leveling up the quality and reliability of large datasets not just services and APIs
Experience leading site reliability for a high volume SaaS product
Supported distributed systems in AWS
The presence and empathy required to hold teams to account
Defined SLAs / SLOs both internal and client facing
Offered post mortems to enterprise clients (verbal and written)
Benefits
Hybrid working and the option to work from *almost* anywhere for up to 90 days per year
£500 Remote working budget to set up your home office space
$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
Holidays: 25 days of annual leave + bank holidays
An extra day for your birthday
Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
Private Health Insurance - we use Vitality!
Full access to Spill Mental Health Support
Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.