Join Ellison Institute of Technology as a Senior ML Infrastructure Engineer. Build and operate high-performance ML infrastructure to enable scientific breakthroughs in Oxford.
Responsibilities
**Day-to-day, you might:**
Build, operate, and continuously optimise our high-performance GPU training and inference clusters, focusing on robust, high-availability scheduling, isolation, and automated lifecycle management.
Drive systems design and implementation for high-throughput data paths, optimising I/O, caching, and data locality across compute and storage (including our current Lustre implementation).
Proactively benchmark, profile, and resolve performance bottlenecks across the compute, network, and orchestration layers to maximise efficiency for distributed training and inference.
Establish comprehensive observability, resilience, and automated security controls to ensure compliance and robust operation of sensitive research environments.
Partner with Research, Data, and Applied teams to forecast capacity and cost for GPU and storage needs, setting quotas and streamlining ML experimentation pipelines.
Requirements
**What makes you a great fit:**
Proven experience leading the design, build, and operation of high-performance ML compute clusters at scale
A proactive, autonomous approach to systems design and the proven ability and desire to ideate, co-create and implement optimal solutions
Exposure to migrating or transforming ML infrastructure from traditional schedulers to modern, containerised systems
Expertise with high-throughput storage systems for ML/HPC workloads
Expert-level understanding of GPU architecture, high-speed networking for distributed training, and performance profiling to resolve bottlenecks
A solid grasp of IaC and CI/CD practices (e.g., Terraform, Argo CD)
**It would also be great if you had:**
Experience with Lustre
Benefits
**We offer the following salary and benefits:**
Enhanced holiday pay
Pension
Life Assurance
Income Protection
Private Medical Insurance
Hospital Cash Plan
Therapy Services
Perk Box
Electric Car Scheme
**Why work for EIT:**
At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. We value emotional intelligence, empathy, respect, and resilience, and encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!
Senior Infrastructure Engineer specializing in Cisco and VMware to modernize hybrid environments for strategic partners. Ownership and mentorship role within a collaborative IT team.
Data Cloud & Infrastructure Architect connecting BigQuery potential with Salesforce execution. Mastering identity resolution and driving real - time data orchestration in a hybrid environment.
Infrastructure Engineer developing infrastructure technology for public and private cloud environments. Complying with security and operational requirements, while using automation to enhance product testing.
Cloud & Infrastructure Engineer designing and supporting solutions across Power Platform and Microsoft 365. Collaborating with technical teams to ensure smooth and secure operations.
Infrastructure Specialist for Far East Organization ensuring availability and security of enterprise infrastructure, focusing on network operations and cybersecurity controls.
Infrastructure Engineer collaborating with teams to build infrastructure solutions at HCSC. Focusing on efficiency and improving deployment times in healthcare technology.
Infrastructure Engineer engineering infrastructure technology for cloud environments with security and operational compliance. Collaborating with stakeholders to inform product roadmaps and providing operational support.
Junior Infrastructure Engineer at ZILO, supporting AWS and cloud infrastructure deployment and maintenance. Collaborating with DevOps and Engineering teams on innovative technology solutions.
L2 Infrastructure Engineer at The Missing Link delivering high - quality tech support and managing modern endpoint environments in Pune. Join a collaborative team for innovative IT solutions.
Infrastructure Engineer designing and building workflows, internal tools, and services at MUBI. Collaborating in a hybrid London setting, connecting systems with AI - powered automation.