Site Reliability Engineer ensuring smooth operations for banking systems at GFT. Working on production system access, deployment, and observability in AWS and Kubernetes environments.
Responsibilities
Participate in on-call rotations to provide support for critical systems.
Engineers are required to work on a rotating 2-2-2 schedule: 2 morning shifts followed by 2 days off, 2 afternoon shifts followed by 2 days off, and 2 night shifts followed by 2 days off.
Morning: 09:00 AM - 06:00 PM
Afternoon: 05:00 PM - 02:00 AM
Night: 01:00 AM - 10:00 AM
Resolve system incident when occurs
Deployment of changes into staging and production environments.
Work with Platform Engineers to understand the changes.
Develop deployment pipeline for changes.
Understand the changes and develop observability (monitoring and alert) according to the changes.
Develop and conduct resiliency testing solution.
Continuous enhancement of monitoring solution.
Create and update operation runbooks.
Automate operation runbooks.
Requirements
Strong experience with Amazon Web Services
Strong experience and understanding of Kubernetes system
Scripting skills with Python or Bash
Experience in continuous deployment tools Harness (good to have)
Experience in infrastructure as code (IaC) tools Terraform
Experience with observability solutions Prometheus & Grafana SumoLogic (good to have)
Cloud DevOps Engineer playing a pivotal role in developing migration plans for Coast Guard Cloud Architecture. Collaborating with teams to ensure effectiveness and best practices in cloud implementation.
Reliability Engineer III at Daimler Truck developing propulsion technology solutions for electrified and conventional axle components. Leading testing and validation for complex powertrain systems.
Electrical Reliability Engineer at Marathon Petroleum maintaining electrical equipment and systems. Collaborating with cross - functional teams and ensuring compliance with electrical codes and standards.
Senior DevOps Engineer focused on GCP platform engineering at healthtech startup. Collaborating with teams to enhance compute and networking capabilities.
SME DevOps Engineer delivering enhancements for enterprise data and analytics products across DoD organizations. Collaborating with government and industry partners to translate strategic requirements into scalable solutions.
DevOps Engineer designing CI/CD pipelines and managing Azure cloud infrastructure for leading organizations. Collaborating with global teams and automating deployment processes across projects.
Senior DevOps professional at iugu managing system reliability and performance in a dynamic environment. Collaborating with development teams and automating processes for efficiency.
Site Reliability Engineer maintaining stability and availability of healthcare staffing platform while collaborating with engineering teams on AWS migration projects.
Site Reliability Engineer maintaining the ShiftKey Marketplace platform while ensuring its stability and availability. Collaborating on infrastructure projects and support with a remote - first approach.
Site Reliability Engineer ensuring platform stability and managing AWS migration. Focused on hands - on maintenance work and engineering automation for healthcare staffing platform.