Senior Engineer responsible for building and maintaining the hybrid platform for autonomous vehicles. Focused on reliability and automation for critical systems at GM.
Responsibilities
Implement and manage Service Level Objectives (SLOs) and SLIs for critical hybrid services, ensuring the platform meets rigorous uptime and readiness targets.
Drive the automation of foundational on-prem utilities—including DHCP, PXE, and NTP—to ensure the fleet of remote CI-based hardware benches is always provisioned and ready-state.
Build and optimize observability stacks (dashboards and alerting) to detect system degradation before it impacts developers, focusing on reducing Mean Time to Recovery (MTTR).
Own the integrity of data ingestion paths from physical test benches through the secure cloud network, ensuring dependencies are stable and performant.
Identify and eliminate "human duct tape" by replacing manual, repetitive tasks with robust automation primitives and self-service tools.
Provide technical guidance and peer reviews for other engineers, fostering a culture of high-quality code and resilient architecture.
Requirements
Proven professional experience in Site Reliability Engineering (SRE) or DevOps, ideally within a hybrid cloud environment.
Strong proficiency in Linux systems administration and the management of core networking services (DHCP/PXE).
Hands-on experience with Infrastructure as Code (IaC) and configuration management tools (e.g., Chef, Ansible, or Terraform).
Ability to break down broad technical challenges into clear, implementation-ready initiatives with minimal supervision.
A 'Growth-based Mindset' with a commitment to continuous learning and upskilling in a high-velocity environment.
Experience with Kubernetes (k8s) and monitoring high-throughput data pipelines.
Cloud Operations Engineer supporting and maintaining multi - cloud public infrastructure for enterprise customers. Working in structured ITIL environment and contributing to operational excellence.
DevOps Engineer building and maintaining authentication platforms in multi - cloud environments. Using technologies like Terraform, Ansible, and Python for automation and optimization.
Cloud Engineer developing Infrastructure - as - Code with Terraform and Azure DevOps. Managing Azure infrastructure and leading incident response within cross - functional teams.
DevSecOps Engineer at Skillfield working on secure CI/CD pipelines for mobile - first delivery. Collaborating with teams to embed security and automation in the delivery lifecycle.
Lead DevOps Engineer focused on AWS and Azure data platform solutions. Collaborating with teams to deliver scalable, secure, and highly available solutions.
DevOps Engineer working at GRÜN Software Group to automate and maintain stable infrastructures. Collaborating with teams to improve deployments and processes for better performance.
Linux System Administrator managing IT infrastructures for educational institutions and research. Collaborating on DevOps and HPC projects while ensuring system security and performance.
Azure SRE Engineer responsible for designing and maintaining secure, scalable Azure cloud infrastructure. Driving automation and operational excellence for leading organizations in technology transformation.
Senior Manager of Site Reliability Engineering overseeing Workday Kubernetes based platform. Leading teams while ensuring high availability and collaborating with federal agencies.