Lead and grow a team of SREs to build automated, resilient infrastructure and reliable deployment pipelines.
Collaborate with development teams, oversee incident response and drive DevOps best practices across the organization.
Responsibilities
Manage and lead a team of 6 passionate SREs
Implement tools and processes for deployment and industrialization (CI/CD, blue/green, canary, rollback, etc.)
Automate provisioning of a resilient infrastructure that meets product needs
Work with development teams to facilitate regular releases
Maintain services in operational condition; analyze and resolve performance and scalability issues (including load testing) for current and historical deployments
Oversee the application portfolio in collaboration with the Network Operations Center (NOC); manage access and security
Contribute to the evolution of the IT infrastructure (e.g., VMware to KVM migration and service offering) and reduce technical debt
Act as a DevOps advocate and help build a transversal SRE community across the company
Share company information and communicate team activities
Define and maintain a clear, relevant team organization
Develop the team while avoiding micromanagement
Requirements
Minimum 3 years’ experience in a similar role
Proven managerial experience
Knowledge of industrialization processes, agile methodologies, GitFlow and DevOps best practices, with a solid understanding of system administration
Experience maintaining high availability systems
Experience with on-call organization and incident response
Strong Linux skills; Windows knowledge is a plus
Proficiency with Infrastructure-as-Code: Terraform, Ansible
Experience with logging and monitoring: ELK (Elasticsearch, Logstash, Kibana), Prometheus
Hands-on experience with Docker, Kubernetes, Consul, Vault
Experience with messaging systems such as RabbitMQ
Experience with databases such as PostgreSQL, MongoDB, Elasticsearch
Good knowledge of backup and recovery systems
Strong verbal and written English skills
Empathetic and open-minded
Benefits
Dynamic and creative environment within international teams
Wide range of self-learning courses available on our e-learning platform
Opportunities to participate in local and international meetups and conferences
Cloud Engineer at MetroStar focusing on building and securing cloud - native systems. Managing Kubernetes workloads and CI/CD pipelines in Agile teams with an emphasis on security.
Senior Engineer Cloud Engineering role focused on AWS migration and automation. Collaborating with teams to innovate cloud patterns and infrastructure best practices.
Senior Operations Engineer driving efficiency and reliability in NVIDIA's global business operations. Collaborating with IT subsystems and automating operational workflows for organizational impact.
Lead or Senior DevOps Developer joining Boeing Defense, Space and Security for advanced technology missions. Involves CI/CD, cloud systems design, and collaboration with government customers.
Site Reliability Engineer ensuring high availability and performance for digital platforms in retail. Collaborating with engineering teams for automation and observability practices.
Associate Site Reliability Engineer supporting the reliability and performance of global IT infrastructure at Exegy. Engage with senior engineers and learn foundational systems engineering skills.
Site Reliability Engineer driving innovation and growth for Banking Solutions, Payments, and Capital Markets business. Responsible for application reliability and incident response in a hybrid work environment.
DevSecOps role at Tiime ensuring implementation of security practices in products. Collaborate with teams for cloud security and incident management in a hybrid workspace.
Senior Site Reliability Engineer responsible for designing reliable infrastructure supporting Fixify's SaaS platform. Collaborating with product engineering teams and maintaining operational standards for infrastructure performance.
DevOps Engineer working with critical infrastructure systems for Swedish internet services. Focused on building and managing robust systems and contributing to automation and operational improvements.