Senior Site Reliability Engineer leveraging modern Kubernetes and cloud-native technologies for high reliability and scalability. Solving platform challenges while contributing to improved managed services.
Responsibilities
Design and implement observability solutions using Prometheus, Loki and Mimir, including defining meaningful alerts
Analyze, troubleshoot and further develop custom Kubernetes controllers to ensure reliability and stability
Develop and maintain production applications with a focus on code quality, scalability and operational readiness
Operate, automate and continuously evolve the MKA platform with a focus on efficiency and maintainability
Enhance internal tooling to drive automation and reduce manual effort
Requirements
Experience operating highly available, business-critical applications in cloud and on-premises environments, including incident leadership
Strong Kubernetes knowledge and experience in cluster management
Experience with GitOps principles and ArgoCD for deployment and delivery workflows
Experience with Infrastructure as Code, particularly Terraform and Ansible
Proficient in Bash and/or Python for automation and tooling
Understanding of CI/CD pipelines, ideally with Tekton-based workflows
Very good German skills and good English skills (B2+) for technical collaboration
Nice to have: experience programming in Go
Experience with Nix for development tooling and automation
Experience with Helm, Make and Git
Additional experience with cloud-native platforms, observability or platform automation
Benefits
Deep hands-on Kubernetes experience
Freedom to solve challenges
Opportunities to share knowledge and continuously learn
Collaborative team environment
Internal show-and-tell sessions
Attendance at conferences such as KubeCon or Container Days
Job title
Senior Site Reliability Engineer – Kubernetes Platform
Client Services Consultant specializing in DevOps Mainframe Operations with experience in automation best practices. Analyzing Life Cycle Management data needs and evaluating solutions for Endevor - related operations.
Senior AWS DevOps Engineer at LexisNexis shaping global CI/CD platform. Collaborating with teams to deliver secure, reliable, and scalable delivery pipelines.
Cloud Engineer at MetroStar focusing on building and securing cloud - native systems. Managing Kubernetes workloads and CI/CD pipelines in Agile teams with an emphasis on security.
Senior Engineer Cloud Engineering role focused on AWS migration and automation. Collaborating with teams to innovate cloud patterns and infrastructure best practices.
Senior Operations Engineer driving efficiency and reliability in NVIDIA's global business operations. Collaborating with IT subsystems and automating operational workflows for organizational impact.
Lead or Senior DevOps Developer joining Boeing Defense, Space and Security for advanced technology missions. Involves CI/CD, cloud systems design, and collaboration with government customers.
Site Reliability Engineer ensuring high availability and performance for digital platforms in retail. Collaborating with engineering teams for automation and observability practices.
Associate Site Reliability Engineer supporting the reliability and performance of global IT infrastructure at Exegy. Engage with senior engineers and learn foundational systems engineering skills.
Site Reliability Engineer driving innovation and growth for Banking Solutions, Payments, and Capital Markets business. Responsible for application reliability and incident response in a hybrid work environment.
DevSecOps role at Tiime ensuring implementation of security practices in products. Collaborate with teams for cloud security and incident management in a hybrid workspace.