Hybrid Staff DevOps Engineer

Posted last month

Apply now

About the role

  • Staff DevOps Engineer designing and architecting secure, scalable infrastructure for AI workloads at webAI. Leading technical initiatives and mentoring engineers on cloud architecture and reliability best practices.

Responsibilities

  • Design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud (AWS, Azure, GCP) and hybrid environments
  • Build and maintain production-grade Infrastructure as Code (IaC) using Terraform, Ansible, or Pulumi, managing 100+ resources with GitOps workflows and automated validation
  • Design and operate production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization
  • Implement secure CI/CD pipelines with integrated security controls (SAST, DAST, vulnerability scanning, secrets management) and automated deployment workflows for containerized AI models
  • Lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift
  • Design comprehensive observability and monitoring using Prometheus, Grafana, ELK, or Datadog with distributed tracing, APM, and real-time alerting aligned to SLIs/SLOs
  • Implement security best practices including least-privilege access, encryption at rest/in transit, network segmentation, and automated compliance validation
  • Lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability
  • Architect disaster recovery and business continuity strategies with automated backup, failover, and recovery processes
  • Develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns across teams
  • Mentor mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance
  • Drive technical documentation and knowledge sharing including runbooks, architecture decision records (ADRs), and infrastructure standards

Requirements

  • 7+ years of hands-on experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering with proven track record of architecting production systems
  • Expert-level proficiency with Docker, Kubernetes (CKA/CKAD preferred), and cloud-native technologies in production environments
  • 5+ years implementing Infrastructure as Code with Terraform, Ansible, or Pulumi, managing large-scale (50+) cloud resources
  • Deep experience with cloud platforms (AWS, Azure, or GCP) including compute, networking, storage, and managed services
  • Proven experience building and scaling CI/CD pipelines with integrated security controls (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
  • Strong programming skills in Python (preferred for automation), Bash, or Go for infrastructure tooling and automation
  • Production experience with observability and monitoring tools: Prometheus, Grafana, ELK, CloudWatch, Datadog, or similar
  • Experience with MLOps workflows: model deployment automation, versioning, and lifecycle management
  • Demonstrated experience with GitOps methodologies and declarative infrastructure management
  • Strong understanding of security best practices: encryption, secrets management, identity and access management (IAM), network security
  • Excellent written and verbal communication skills for technical documentation and cross-functional collaboration.

Benefits

  • Competitive salary and performance-based incentives.
  • Comprehensive health, dental, and vision benefits package.
  • 401k Match (US-based only)
  • $200/mos Health and Wellness Stipend
  • $400/year Continuing Education Credit
  • $500/year Function Health subscription (US-based only)
  • Free parking, for in-office employees
  • Unlimited Approved PTO
  • Parental Leave for Eligible Employees
  • Supplemental Life Insurance

Job title

Staff DevOps Engineer

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job