Site Reliability Engineer responsible for leading technology teams at SS&C. Delivering scalable and resilient infrastructure platforms in the financial services and healthcare technology sector.
Responsibilities
Collaborate with Technology Infrastructure teams to build and operate reusable, cloud-native platforms
Work with business units and technical teams to improve application availability, observability, and reliability
Enhance platform reliability through automatic problem detection and self-healing systems
Use SLOs, SLIs, and KPIs to guide prioritization and measure impact
Eliminate toil using intelligent automation and agentic workflows
Conduct blameless retrospectives and share learnings across the organization
Foster a culture of ownership and continuous learning
Integrate DevSecOps, zero-trust principles, and policy-as-code into every pipeline
Produce and promote Architecture Decision Records (ADRs) and Cloud Well-Architected Frameworks
Requirements
5+ years of professional experience in a SRE role
3+ years in financial services or other regulated industries preferred
Minimum Bachelor’s degree in Computer Science, Engineering, or a related field
Proven expertise in architecting, designing and operating private cloud environments (e.g., VMware, OpenStack, OpenShift Virtualization) and Kubernetes clusters
Hands-on experience with building, deploying and operating infrastructure as code platforms
Experience with CI/CD pipelines and observability platforms (e.g., Prometheus, Splunk)
Strong understanding of modern systems reliability standards and practices
Familiarity with financial services regulatory frameworks and their impact on infrastructure design and operations
Familiarity with structured naming conventions and asset management for global infrastructure
Experience with financial-grade network segmentation, micro-segmentation, and zero-trust architecture
Certifications such as TOGAF, AWS Certified Solutions Architect, VMware VCP, or Red Hat Certified Architect are a plus
Familiarity with ISO 27001, NIST 800-53, and other security frameworks is a plus.
Senior Lead Site Reliability Engineer overseeing critical systems stability and incident management. Leading Java applications reliability and supporting a dynamic technology environment.
Infrastructure Architect connecting clients and Kyndryl. Leading projects from start to finish, ensuring technical solutions meet client needs at Kyndryl.
DevOps Engineer automating and configuring network monitoring and automation solutions for Telia’s telecom operations in Finland. Ensuring performance, resilience, and high observability of critical platforms.
Client Services Consultant specializing in DevOps Mainframe Operations with experience in automation best practices. Analyzing Life Cycle Management data needs and evaluating solutions for Endevor - related operations.
Senior AWS DevOps Engineer at LexisNexis shaping global CI/CD platform. Collaborating with teams to deliver secure, reliable, and scalable delivery pipelines.
Cloud Engineer at MetroStar focusing on building and securing cloud - native systems. Managing Kubernetes workloads and CI/CD pipelines in Agile teams with an emphasis on security.
Senior Engineer Cloud Engineering role focused on AWS migration and automation. Collaborating with teams to innovate cloud patterns and infrastructure best practices.
Senior Operations Engineer driving efficiency and reliability in NVIDIA's global business operations. Collaborating with IT subsystems and automating operational workflows for organizational impact.
Lead or Senior DevOps Developer joining Boeing Defense, Space and Security for advanced technology missions. Involves CI/CD, cloud systems design, and collaboration with government customers.
Associate Site Reliability Engineer supporting the reliability and performance of global IT infrastructure at Exegy. Engage with senior engineers and learn foundational systems engineering skills.