SRE / DevOps professional specializing in cloud automation and observability to ensure operational excellence and collaboration with Development and Infrastructure teams at GFT.
Responsibilities
We are looking for an SRE/DevOps professional with solid experience in automation, observability and reliability practices, working in cloud environments with a strong focus on AWS.
This professional will play a strategic role in ensuring availability, performance, security and operational efficiency, working closely with Development and Infrastructure teams.
Requirements
Work in partnership with the Development team, supporting the building, maintenance and evolution of applications;
Expand, optimize and evolve CI/CD pipelines;
Troubleshoot and analyze incidents using APM and observability tools;
Work daily with cloud technologies, primarily AWS;
Lead initiatives to optimize costs and performance of services;
Ensure reliability, availability, security and scalability of applications and infrastructures;
Analyze existing architectures and propose structural improvements;
Identify processes that can be automated and implement them;
Promote best practices and support DevOps and SRE culture across the organization;
Strong experience with AWS, including: Cognito, Aurora PostgreSQL, EKS, Lambda, S3, API Gateway, DynamoDB, EC2, DocumentDB, SNS, and OpenSearch;
Experience with messaging and streaming: RabbitMQ, SQS, Kafka, Kinesis;
Experience with Infrastructure as Code (IaC): CloudFormation and Terraform;
Experience in rightsizing resources, provisioning new services, optimizing workloads and cluster architecture;
Knowledge of Windows Server (Active Directory, IIS, Windows Services);
Experience with CI/CD tools such as Jenkins and Azure DevOps;
Knowledge of Shell scripting and Python;
Advanced experience in SRE (Site Reliability Engineering) practices;
Experience with complex automations or large-scale pipelines;
AWS, Kubernetes, DevOps or SRE certifications;
Previous experience in high-criticality financial or corporate environments;
Experience with other clouds: GCP and Azure;
Knowledge of CI/CD with the AWS stack and GitLab CI;
Experience with SQL and NoSQL databases, including PostgreSQL;
Development experience with Kotlin, Java, Go and Spring Boot;
Experience with observability tools: Datadog, Grafana, Prometheus, Zabbix, New Relic, Dynatrace;
Knowledge of Big Data, especially the AWS stack.
Benefits
Multi-benefit card – you choose how and where to use it.
Tuition assistance for undergraduate, graduate, MBA and language courses.
Certification incentive programs.
Flexible working hours.
Competitive salaries.
Annual performance review with a structured career plan.
Possibility of international career opportunities.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.