Director of Site Reliability Engineering at Mastercard, overseeing resilience and operational excellence initiatives. Leading a high-performing team of technical leaders within CX Technology.
Responsibilities
Lead and develop a team of highly skilled people leaders and senior individual contributors within the CX Technology organization, fostering a culture of accountability, innovation, and continuous improvement
Define and drive the short-term and medium-term strategic vision for Site Reliability Engineering, aligning reliability, scalability, and operational efficiency initiatives with broader Mastercard technology and business objectives
Lead the design and execution of cross-functional initiatives that improve system resilience, automate operational processes, and mature incident management, problem management, and reliability engineering practices
Establish, evolve, and govern reliability standards, operational best practices, and control frameworks to ensure consistent adoption across engineering and delivery teams
Partner closely with engineering, product, architecture, and business stakeholders to embed reliability requirements into system design, development, deployment, and lifecycle management processes
Oversee major incident response and escalation efforts, ensuring rapid recovery, effective communication, and high-quality root cause analysis with actionable remediation
Promote proactive risk identification and mitigation through observability, capacity planning, resiliency testing, and automation-driven approaches
Champion continuous improvement by leveraging operational metrics, insights, and retrospectives to drive measurable improvements in availability, stability, and customer experience
Stay informed on industry trends, emerging technologies, and modern SRE practices, applying relevant innovations to advance Mastercard’s operational maturity
Manage goal setting, coaching, performance management, and talent development for people leaders and senior technologists, building a strong leadership pipeline and sustaining operational excellence at scale.
Requirements
Proven experience leading Site Reliability Engineering, Production Engineering, or large-scale operations teams within complex, highly available, distributed technology environments
Strong people leadership background, including managing managers and/or senior technical leaders, with demonstrated success building high-performing, inclusive teams
Deep understanding of reliability engineering principles, including incident management, automation, telecom, observability, resilience engineering, capacity planning, and service lifecycle management
Demonstrated ability to translate strategy into execution by evolving processes, programs, and policies to drive meaningful and measurable operational improvements
Experience partnering across engineering, product, and business functions to influence design decisions and embed reliability throughout the development lifecycle
Strong analytical and problem-solving skills, with a track record of driving root cause analysis and long-term corrective actions
Excellent communication and stakeholder management skills, with the ability to lead through influence at senior and executive levels
Passion for continuous improvement, operational discipline, and leveraging technology to reduce toil and improve system outcomes at scale
Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience; advanced degree preferred.
Benefits
Must abide by Mastercard’s security policies and practices
Ensure the confidentiality and integrity of the information being accessed
Report any suspected information security violation or breach
Complete all periodic mandatory security trainings
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.