Site Reliability Engineer at Reward Gateway transforming operational workloads to an SRE approach. Collaborating with Product Engineering teams and advocating for observability and reliability.
Responsibilities
Integrating tightly with our Product Engineering teams
Following SRE practices and maintaining high standards of compliance
Implementing a new standard of observability utilising SLI/SLO/Error Budgets
Continually evolving our observability platforms for greater coverage
Using a code-first approach to build and changes to reduce TOIL
Advocating a strong focus on availability, reliability and uptime
Liaising and embedding with the Engineering teams for the constant evolution of metrics
Working towards planned roadmap goals
Actively taking part in the daily stand-ups and keeping sprints on track
Keeping up-to-date documentation in the JIRA & Confluence tools
Taking part in SRE Incident Management processes
Acting as a key Incident Commander within the Incident Management process
Taking part in SRE On Call
Ensuring a focus on cost efficiency for the platforms & services
Working with team members to foster collaboration and ongoing communication with stakeholders
Requirements
At least 5 years of experience in DevOps or SRE, with a keen interest in growing as a Site Reliability Engineer
Experience with AWS or other cloud providers
Enterprise experience in HA environments
Automation skills through Terraform, Python, Bash or similar
Wide-reaching SRE skills and a deep understanding of SRE practices
A strong understanding of SQL, PHP, Kubernetes, CI/CD
Observability product experience (e.g., Datadog)
Managing services using SLI/SLO & Error Budgets
Ability to work both independently and as part of a team
Ability to work under pressure and be highly reliable
Adaptability and flexibility to change in a fast-moving environment
An ability to learn new tools and processes quickly and impart that knowledge
Benefits
Screening interview with the Talent Partner and Head of SRE
Final interview with the Head of SRE and the Director of Infrastructure.
Be comfortable. Be you. At Reward Gateway, we want all our employees to feel comfortable bringing their passion, creativity and individuality to work. We value all cultures, backgrounds, and experiences, as we truly believe that diversity drives innovation. Express yourself, join our community and help us Make the World a Better Place to Work.
We hire BETTER. From perks to people, our BETTER approach to hiring earns us more trust, happier people and more world-class talent that helps us to make the world a better place to work.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.