Site Reliability Engineer at Zefr applying cloud infrastructure expertise, collaborating on ML applications and fostering DevOps culture. Building scalable systems for responsible marketing in social environments.
Responsibilities
Support and build systems and tools that enable other engineers to generate, deploy, and manage product features and models both quickly and safely.
Deploy and support a multi-cloud, micro-service architecture, including infrastructure tailored for ML workloads, deployed via Github Actions, ArgoCD & Kubernetes.
Collaborate with other engineers, particularly the Machine Learning team, to architect secure, resilient, scalable, and cost-efficient applications and ML systems/pipelines in AWS and GCP.
Foster and push our DevOps culture and philosophy by encouraging continuous improvement across all engineering teams.
Proactively maintain the health of production environments, including monitoring application performance and resource utilization.
Participate in 24/7 on-call rotation, respond to system performance issues and outages.
Debug code at the application and infrastructure level.
Mature our CI/CD workflows and release process.
Maintains a forward-thinking approach, actively researching and proposing new solutions.
Propose and review Engineering Request for Comments (RFC) to drive Engineering architecture and practices.
Requirements
7+ year job history designing, managing, deploying, and supporting Cloud Infrastructure in a production environment using major public cloud providers (GCP experience a huge bonus)
Knowledge of GitOps including an understanding of modern CI/CD pipelines, techniques and technologies (Github Actions, GitLab, CircleCI, Argo CD, Flux)
Proficiency with IaC and configuration management tools (Terraform, Terragrunt, OpenTofu, Crossplane, Pulumi)
Production experience architecting, managing, deploying, and supporting container based workloads into Kubernetes clusters
Strong problem-solving experience, focusing on automation
Proven track record of building and scaling reliability practices, including SLO/SLI frameworks, incident management, and capacity planning.
Heavy Production experience with observability platforms and practices (Prometheus, Grafana, Chronosphere, Datadog, OpenTelemetry); ability to design monitoring strategies for complex distributed systems.
Knowledge of cloud networking (Mesh, NAT, Load Balancers, API Gateways, proxies, etc), cloud security, and cost optimization strategies.
Strong written and verbal communication, organization, and documentation skills
Benefits
Flexible PTO
Medical, dental, and vision insurance with FSA options
Company-paid life insurance
Paid parental leave
401(k) with company match
Professional development opportunities
10+ paid holidays off
Summer Fridays (we leave early)
In-office, hybrid, and fully-remote work options available
In-office lunches and lots of free food
Optional in-person and virtual events (we like to celebrate!)
DevOps Intern at CCC Intelligent Solutions focusing on cloud infrastructure management and AI model deployment. Gain hands - on experience in DevOps and cloud automation with AWS and Azure.
Systemadministrator and DevOps - Engineer managing ongoing systems in web - based software development. Collaborating on infrastructure and supporting product development with a small team in Bremen.
DevOps Engineer collaborating with teams to ensure reliable software delivery. Focus on CI/CD workflows and platform services within a hybrid work environment.
DevSecOps Engineer embedding security controls into AI development workflows within a financial services environment. Focus on securing AI - generated code through CI/CD pipeline enhancements and SAST integration.
DevOps Engineer ensuring the stability, scalability, and security of systems in all environments at a leading fintech company. Focused on automation and CI/CD to enhance operational excellence and collaboration.
DevOps Engineer focused on SRE principles and AWS - centric infrastructure for VERBI Software. Managing reliability, scalability, and modernization with hands - on approach and technical mentorship.
Senior DevOps Engineer translating high - level requirements into scalable AWS architectures. Leading delivery of robust solutions and elevating engineering standards across the organization.
DevSecOps Engineer at Onepoint working with tech innovations for clients. Responsible for secure CI/CD pipelines and vulnerability management in collaboration with development teams.