Lead Site Reliability Engineer at S&P Global's Cloud Engineering team. Responsible for designing and maintaining cloud infrastructure and ensuring the performance of cloud-based systems.
Responsibilities
Develop monitoring solutions to provide complete system coverage
Strategically plan and organize team deliverables
Practice and document disaster recovery scenarios
Streamline our software build and release pipeline
Engage in post-incident reviews to learn from failures and solve the root cause
Able to be available for incident escalations
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field.
+10 years of relevant experience in related roles.
AWS Certified Solutions Architect – Professional certification
Expertise in scripting languages such as Python, Bash, and PowerShell
Advanced experience with infrastructure-as-code tools, including Terraform and CloudFormation
Strong time-management skills with the ability to prioritize tasks and meet deadlines
Excellent troubleshooting abilities to diagnose and resolve complex issues efficiently
Clear and effective communication skills, especially when explaining complex technical concepts
Proven leadership experience with the ability to guide and mentor a team of engineers.
Benefits
Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.
Senior Analyst on Data Platform DevOps at AIMCo, responsible for building data operations and collaborating with teams on innovative solutions. Focused on ensuring data quality and integrity across technologies.