Platform Engineer improving AWS and GPU clusters for quantum simulations, collaborating with quantum researchers and shaping the platform's evolution.
Responsibilities
Own our AWS infrastructure end-to-end and actively shape how it evolves; building, not just maintaining.
Reduce friction in the deployment pipeline so developers can ship without infrastructure blockers.
Harden systems with intention: lock down IAM roles, container images, and authentication flows in ways that reflect a clear understanding of where the real risks are.
Implement monitoring and alerting that catches production issues before users notice them.
Make deployments faster to roll out, easier to roll back, and less prone to failure.
Lead incident response and post-mortems when necessary.
Make GPU clusters and other infrastructure invisible to the researchers running it.
Own CUDA compatibility and driver versions across heterogeneous GPU clusters.
Build standardized SLURM job submission workflows that researchers can use without help.
Package and containerize Python simulation code for reproducible execution.
Monitor job health across utilization, cost, and runtime efficiency.
Requirements
Experience: 5+ years in Platform Engineering, DevOps, or SRE roles.
Production AWS experience: Built and maintained systems on ECS/EKS, managed multi-account networking (VPCs, security groups), and dealt with real-world infrastructure complexity.
Infrastructure as Code: You've written and maintained Terraform (or Pulumi/CDK) in production, including applying ongoing changes as requirements evolved.
CI/CD: Improved build pipelines in production (reduced build times, increased reliability, made deployments easier to debug), including experience with GitLab CI, GitHub Actions, or equivalent.
GPU/HPC experience: Supported GPU workloads in production environments, including code optimization, CUDA debugging, and job scheduler setup.
Background in scientific computing, research infrastructure, ML platforms, or early-stage startups (especially research computing vendors).
Security & compliance experience: You've implemented auth systems (Auth0/Okta), managed encryption (KMS), or worked on FedRAMP/compliance-driven infrastructure. FedRAMP experience is a strong plus.
Exposure to quantum computing SDKs (Qiskit, Cirq, PennyLane) or hybrid classical-quantum workflows is a plus, but not required; genuine interest in quantum computing matters more than prior exposure.
Staff Platform Engineer joining URBN to develop AI - powered digital experiences and integrate algorithmic solutions. Collaborating with cross - functional teams to deliver impactful products.
Staff Platform Engineer responsible for defining and scaling data and ML platform at Mistplay. Leading teams in employing data strategies from raw ingestion to real - time model serving.
Senior Platform Engineer designing, building, and operating hybrid infrastructure solutions for a digital marketplace of used vehicles. Key responsibilities include improving operational efficiency and ensuring system reliability.
Engineer building systems within a mission - driven healthcare company focused on longevity. Collaborate, design, and innovate in a hybrid work environment based in Paris.
Security Platform Engineer managing operational security tasks at NTT DATA. Collaborating in incident response and security event monitoring within a 24/7 team environment.
Infrastructure Specialist at Kyndryl responsible for managing IT infrastructure projects. Offering analysis, solutions, and hands - on involvement throughout project lifecycles.
Sr. Platform Support Engineer in SRE Operations team at Saviynt. Ensuring stability and reliability of Enterprise Identity Cloud through application support and operational ownership.
Microsoft Power Platform Developer responsible for building automation solutions to improve operational efficiency. Collaborating with teams to enhance processes using Microsoft Power Platform tools.
Azure Platform Operations Engineer responsible for operational management of core Azure shared platform services at Benefact Group. Involves collaboration with architects and senior engineers.
Smarsh seeks a Platform Engineer I to design AWS cloud infrastructure for digital communications risk management. Collaborate with teams on infrastructure and customer onboarding efforts.