Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Responsibilities
Lead the architecture, reliability, and modernization of our enterprise ALM and DevOps tool ecosystem
Define and evolve HA, DR, and scaling architectures across all ALM tools
Build topology-aware designs and continuously measure and improve platform scalability, performance, and resilience
Ensure all tool services meet strict requirements for availability, reliability, and stability
Define and drive SLIs, SLOs, error budgets, and operational KPIs for every tool
Implement applied observability: actionable metrics, logs, traces, alerts, and dashboards tailored to each platform
Lead root-cause analysis, incident management, and continuous reduction in MTTD/MTTR
Architect Okta integrations across tools: SAML/OIDC/SCIM, entitlement frameworks, group/role mapping, and auditability
Ensure compliance with SRO/Security controls: hardening, secrets management, vulnerability remediation, and audit readiness
Drive the HVA program uplift for tools: threat modeling, compensating controls, and DR testing
Participate in PQC-readiness assessments and roadmap planning
Lead automation of infrastructure, upgrades, backup/restore, and operational workflows using Terraform, Ansible, GitOps and tooling APIs
Create repeatable, consistent patterns for build pipelines (Jenkins/GitHub Actions) and artifact governance (Artifactory)
Work directly with SMEs and Engineering teams to standardize CI/CD, improve reliability, and reduce operational toil
Influence architecture and platform decisions across Engineering, Infrastructure, and Security teams
Mentor Architects, Principals, and Staff Engineers; create reference architectures and operational best practices
Partner with vendors and internal teams to evolve capabilities and ensure long-term platform health.
Requirements
15+ years with enterprise-scale DevOps/ALM platforms; deep expertise in several core tools (Jira, GHES, Jenkins, Artifactory, Kafka, SonarQube, Coverity, qTest)
Demonstrated ability to design and operate HA/DR architectures and deliver 99.95%+ uptime systems
Strong background in SRE/SRO, applied observability, performance engineering, capacity planning, and scaling large tool deployments
Hands-on experience with Okta, identity federation, SCIM, authorization models, and enterprise entitlement design
Solid foundation in Linux, networking, containers, Kubernetes, databases, and distributed systems.
Benefits
Your life. Your health. Supported by your benefits.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.
Senior Platform DevOps Engineer at Code Metal designing and implementing cloud and hybrid infrastructure to support customer deployments and internal platforms. Collaborating with software and security teams for reliable delivery.