Site Reliability Engineer focusing on AWS cloud services and Site Reliability Engineering practices. Collaborating on performance, availability, and observability within a hybrid work environment.
Responsibilities
Work on SRE initiatives and activities in an AWS cloud environment;
Define and monitor Service Levels (SLAs), Service Level Indicators (SLIs) and performance metrics;
Expand and consolidate Site Reliability Engineering (SRE) practices;
Assess service maturity and define optimization strategies and process adjustments;
Monitor technical and business metrics, ensuring availability, resilience and performance of IT services;
Participate in modernization and cloud migration projects;
Work on projects and design architectures focused on Observability.
Requirements
Experience with Observability and APM tools such as Grafana, AppDynamics, Dynatrace, Prometheus, DataDog, ELK and Zabbix;
Experience in log analysis and troubleshooting connectivity and integrations between applications and partners;
Experience optimizing cost and performance of cloud services on AWS;
Focused on reliability, availability and security of services.
Benefits
Multi-benefits card – you choose how and where to use it.
Scholarships for Undergraduate, Postgraduate, MBA and language courses.
Certification incentive programs.
Flexible working hours.
Competitive salaries.
Annual performance review with a structured career plan.
Cloud Operations Engineer supporting and maintaining multi - cloud public infrastructure for enterprise customers. Working in structured ITIL environment and contributing to operational excellence.
DevOps Engineer building and maintaining authentication platforms in multi - cloud environments. Using technologies like Terraform, Ansible, and Python for automation and optimization.
Cloud Engineer developing Infrastructure - as - Code with Terraform and Azure DevOps. Managing Azure infrastructure and leading incident response within cross - functional teams.
DevSecOps Engineer at Skillfield working on secure CI/CD pipelines for mobile - first delivery. Collaborating with teams to embed security and automation in the delivery lifecycle.
Lead DevOps Engineer focused on AWS and Azure data platform solutions. Collaborating with teams to deliver scalable, secure, and highly available solutions.
DevOps Engineer working at GRÜN Software Group to automate and maintain stable infrastructures. Collaborating with teams to improve deployments and processes for better performance.
Linux System Administrator managing IT infrastructures for educational institutions and research. Collaborating on DevOps and HPC projects while ensuring system security and performance.
Azure SRE Engineer responsible for designing and maintaining secure, scalable Azure cloud infrastructure. Driving automation and operational excellence for leading organizations in technology transformation.
Senior Manager of Site Reliability Engineering overseeing Workday Kubernetes based platform. Leading teams while ensuring high availability and collaborating with federal agencies.