Site Reliability Engineer at Personio focusing on automated infrastructure and collaboration across engineering teams. Shape the future of HR technology with meaningful impact and ownership.
Responsibilities
Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
Prepare services for production by engaging in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
Ensure sustainable scalability through automation, driving continuous evolution to increase reliability and delivery speed.
Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
Lead incident management processes, including on-call rotations, managing outages, driving post-mortems and conducting root cause analysis.
Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
Define resilience strategies and implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
Mentor, train and grow the community. Guide engineers across teams in reliability best practices and tooling.
Requirements
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
8+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
Strong knowledge of modern application and infrastructure monitoring concepts (Datadog and/or AWS experience advantageous).
Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
Excellent written, verbal, and documentation skills.
Collaborative team player, able to communicate effectively across disciplines.
Benefits
Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
Make an impact on the environment and society with 1 (fully paid) Impact Day.
Receive generous family leave, child support, mental health support, and sabbatical opportunities.
We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.
DevOps Team Lead at Insightful managing DevOps engineers for optimizing cloud infrastructure and CI/CD processes. Focused on team mentoring and operational excellence in a collaborative environment.
Site Reliability Engineer ensuring the reliability and performance of Freewheel systems. Collaborating across teams to optimize infrastructure and automate operations.
DevOps Professional specializing in Salesforce release management at YASH Technologies. Involves CI/CD pipeline management, version control, and collaboration with development teams.
Instrument/Control SIS Reliability Engineer providing technical support for BASF's global engineering team. Delivering complex engineering solutions and ensuring adherence to technical standards and safety regulations across multiple projects.
Site Reliability Engineer working on Linux systems for observability platforms and logging. Design and maintain applications, support network visibility, and collaborate with teams.
DevOps Engineer working at White Circle, focusing on infrastructure for AI systems. Involves managing production environments, Kubernetes, CI/CD pipelines, and automation tools.
Airflow Reliability Engineer on the Customer Reliability Engineering team at Astronomer. Working with clients on optimizing their use of the managed Airflow service in a hybrid role in Hyderabad.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.