Site Reliability Engineer joining Spotify’s Backstage team, building intelligent infrastructure for the world's most popular audio streaming service. Contributing to AI-native workflows and developer experience.
Responsibilities
Orchestrate the Fleet: Maintain and improve Portal’s SaaS infrastructure for reliability, security, and scalability. This covers the runtime environments supporting the platform and workflows powered by large language models.
Modern Infra-as-Code: Collaborate with senior engineers to build infrastructure on GCP and AWS using Terraform and emerging infrastructure-from-code patterns where agents assist in defining the stack.
Support Fullstack Systems: Operate in a modern web stack environment (TypeScript, React, Python). While this isn’t a frontend-heavy role, comfort with debugging fullstack systems and web infrastructure is key.
Reliability Engineering: Participate in on-call rotations to ensure systems meet reliability and availability goals, employing AI assistants to accelerate root cause analysis and incident resolution.
Collaborate & Innovate: Participate in the planning and delivery of technical projects, defining how infrastructure evolves to support the next wave of generative AI features.
Requirements
Cloud Native & AI Curious: Brings hands-on experience with cloud infrastructure (GCP or AWS) and IaC tools like Terraform, with an interest in LLMs, RAG, or agents in an operational context.
Systems Thinker: Understands distributed systems principles and how to operate them reliably at scale, specifically addressing the challenges posed by non-deterministic AI workloads.
Polyglot Practitioner: Experienced with at least one modern programming language (e.g., TypeScript, Java, Go, Python) and comfortable navigating codebases where AI-generated PRs are the norm.
Quality & Automation: Prioritizes code quality and reliability, looking for ways to build systems that test themselves and improve through automated feedback loops.
Growth Mindset: Eager to evolve as an engineer in a landscape where the definition of "operations" changes rapidly. Familiarity with open-source projects or building "coding assistant" bots is a plus.
Site Reliability Engineer at Thales managing secure cloud environments on AWS and GCP. Ensuring compliance, security, and availability of critical cloud platforms with DevSecOps practices.
DevOps Intern at CCC Intelligent Solutions focusing on cloud infrastructure management and AI model deployment. Gain hands - on experience in DevOps and cloud automation with AWS and Azure.
Systemadministrator and DevOps - Engineer managing ongoing systems in web - based software development. Collaborating on infrastructure and supporting product development with a small team in Bremen.
DevOps Engineer collaborating with teams to ensure reliable software delivery. Focus on CI/CD workflows and platform services within a hybrid work environment.
DevSecOps Engineer embedding security controls into AI development workflows within a financial services environment. Focus on securing AI - generated code through CI/CD pipeline enhancements and SAST integration.
DevOps Engineer ensuring the stability, scalability, and security of systems in all environments at a leading fintech company. Focused on automation and CI/CD to enhance operational excellence and collaboration.
DevOps Engineer focused on SRE principles and AWS - centric infrastructure for VERBI Software. Managing reliability, scalability, and modernization with hands - on approach and technical mentorship.
Senior DevOps Engineer translating high - level requirements into scalable AWS architectures. Leading delivery of robust solutions and elevating engineering standards across the organization.