SRE Specialist ensuring reliability and stability of critical products and services at GFT. Seeking a professional with systemic vision and strong analytical skills for hybrid work in São Paulo.
Responsibilities
Define, maintain, and evolve SLIs and SLOs for critical APIs and services;
Manage and communicate error budget consumption, guiding release decisions;
Serve as a reference for balancing agility and operational stability;
Implement and improve monitoring, metrics, logging, and tracing practices;
Ensure actionable alerts and clear dashboards for service tracking;
Lead or support incident responses and war rooms;
Structure incident response processes with a blameless approach;
Conduct postmortems and ensure execution of corrective actions;
Work to reduce MTTA, MTTR, and incident recurrence;
Automate operational workflows and eliminate repetitive tasks (toil);
Create runbooks, automations, and improvements in CI/CD pipelines;
Standardize rollout, rollback processes, and resilience testing;
Work in environments with Kubernetes/EKS, Azure DevOps, Kafka, and databases;
Support technical decisions together with Engineering and Architecture teams;
Optimize performance, capacity, and costs in infrastructure environments;
Promote best practices and raise SRE maturity across squads;
Collaborate with Architecture, DevOps/SRE Enablement, and Security teams;
Influence technical decisions based on data and metrics;
Requirements
Experience with SLIs, SLOs, error budgets, and incident management;
Strong troubleshooting and root cause analysis (RCA) skills;
Maintenance Reliability Engineer specializing in various automated electrical/mechanical components at Northrop Grumman. Supporting manufacturing operations in Magna, Utah, for optimal equipment performance.
Senior Systems Operations Engineer supporting Payments Modernization at Wells Fargo. Managing systems operations and ensuring resilience and observability in payment platforms.
Database Reliability Engineer managing PostgreSQL infrastructure that underpins transactions at Nodal Exchange. Ensuring data integrity and performance in a regulated financial environment.
Senior Information Security Analyst responsible for integrating security practices in development. Join Panvel’s team focusing on securing applications and infrastructure.
DevOps Engineer leading the automation and adoption of DevOps best practices. Collaborating with teams to enhance agile delivery in cloud environments.
Senior Backend Engineer designing and developing backend services in Rust for Mobile DevOps. Collaborating on the Employee Superapp and implementing digital wallet services.
AI Development Operations Engineer responsible for the internal AI infrastructure empowering developers. Integrating AI systems into engineering workflows for efficient software design and maintenance.
Reliability Engineer responsible for availability and performance of U.S. Air Force Cloud services. Collaborates with teams to deliver reliable mission - critical systems in a hybrid environment.
Entry - level DevOps Engineer assisting in cloud infrastructure automation for AI - powered security operations platform. Seeking passionate candidates with foundational knowledge in Terraform, Kubernetes, and CI/CD pipelines.
DevSecOps Engineer responsible for security in CI/CD pipelines for a global client network. Collaborating on security hardening of applications and automation processes.