Hybrid Software Engineer, Compute Infrastructure

Posted 16 hours ago

Apply now

About the role

  • Software Engineer building and operating compute infrastructure powering OpenAI’s AI research. Optimizing Kubernetes clusters and ensuring reliability in supercomputing environments for advanced AI workloads.

Responsibilities

  • Spin up and scale large Kubernetes clusters, including automation for provisioning, bootstrapping, and cluster lifecycle management
  • Build software abstractions that unify multiple clusters and present a seamless interface to training workloads
  • Own node bring-up from bare metal through firmware upgrades, ensuring fast, repeatable deployment at massive scale
  • Improve operational metrics such as reducing cluster restart times (e.g., from hours to minutes) and accelerating firmware or OS upgrade cycles
  • Integrate networking and hardware health systems to deliver end-to-end reliability across servers, switches, and data center infrastructure
  • Develop monitoring and observability systems to detect issues early and keep clusters stable under extreme load

Requirements

  • Experience as an infrastructure, systems, or distributed systems engineer in large-scale or high-availability environments
  • Strong knowledge of Kubernetes internals, cluster scaling patterns, and containerized workloads
  • Proficiency in compute infrastructure concepts (compute, networking, storage, security) and in automating cluster or data center operations
  • Bonus: background with GPU workloads, firmware management, or high-performance computing

Benefits

  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

Job title

Software Engineer, Compute Infrastructure

Job type

Experience level

Mid levelSenior

Salary

$230,000 - $405,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job