Staff Machine-Learning Infrastructure Engineer developing ML infrastructure for Voxel, enhancing workplace safety through AI and computer vision technology.
Responsibilities
Own data & labeling pipelines – architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality-tiered datasets that stay within cost constraints.
Build and operate training infrastructure – create multi-GPU / multi-node training frameworks (Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA-graph, FP8, etc.).
Manage the full model lifecycle – stand up model registries, version control, evaluation suites, and continuous-learning loops that push updates from dev → staging → prod with zero-downtime rollbacks.
Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad.
Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely.
Partner with ML engineers on architecture decisions, from data schemas to inference optimizations, ensuring infra and research road-maps stay tightly aligned.
Requirements
Bachelor’s (or higher) in Computer Science, EE, or related field.
5+ years building and operating large-scale infrastructure, with at least 3 years focused on ML or data-intensive systems.
Proven record designing highly available, distributed systems on Kubernetes (EKS, GKE, or on-prem).
Deep expertise with orchestration (K8s operators, Argo, Kubeflow), and cluster-scale storage / compute (S3, GCS, Ray, Spark, Dask).
Hands-on experience automating data-labeling or ground-truth workflows and maintaining dataset versioning.
Strong software-engineering fundamentals; familiar with best practices for testing, observability, and secure coding.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.
Software Engineer for ML Infrastructure at Slack, architecting systems to support large scale AI deployment and reliability. Engage in deep systems engineering focusing on ML lifecycle and infrastructure scalability.
Machine Learning Engineer at Winnow developing AI solutions for food waste reduction. Collaborate with cross - functional teams and leverage cutting - edge technologies in food recognition.
Senior Engineer developing AI/ML solutions to enhance patient care at Edwards Lifesciences. Collaborating with cross - functional teams to deliver impactful technologies in healthcare.