ML Infrastructure Engineer at ChipStack responsible for building training pipelines for LLMs. Collaborating with chip designers and software engineers in a fast-moving startup environment.
Responsibilities
Build the core infrastructure that enables training, fine-tuning, evaluation, and deployment of LLMs across cloud and on-premise environments
Work alongside highly experienced chip designers, ML scientists, and other top-notch engineers
Contribute to solving some of the hardest problems in chip design
Requirements
5+ years of experience in ML infrastructure or adjacent roles
Deep expertise in Python and experience with training frameworks like PyTorch or TensorFlow
Strong systems engineering skills and experience with distributed training, data pipelines, and performance optimization
Experience deploying ML models to production (REST APIs, batch jobs, streaming pipelines)
Proficiency with cloud platforms (e.g., GCP, AWS) and containerized systems (Docker, Kubernetes)
Experience managing GPU/TPU workloads efficiently
Good communication skills and the ability to work directly with engineers and customers
Prior experience training or fine-tuning LLMs
Experience setting up observability, monitoring, and evaluation pipelines for ML models
Machine Learning Engineer designing and implementing AI systems focused on Japanese language challenges at Woven by Toyota. Involves technical R&D, system design, and collaboration with cross - functional teams.
Principal Software Engineer leading MLOps within Analytics Platform at Sun Life. Focused on AWS and machine learning operations, collaborating across technical and business teams.
Machine Learning Engineer designing and optimizing deep learning models for safety - critical environments at Destinus. Shaping the future of high - speed, autonomous flight technologies.
Machine Learning Engineer optimizing personalization systems for Spotify's audio streaming service. Collaborating with cross - functional teams to enhance user experience and deliver recommendations.
Principal Machine Learning Engineer developing ML and GenAI solutions in a cloud - native environment at Flexera. Leading a high - impact team and driving operational excellence for ML infrastructure.
Senior ML Platform/Ops Engineer building AI - powered ML pipelines for a dynamic Ed - Tech company. Collaborating with ML scientists and engineers to ensure reliable deployment and observability.
Senior ML Platform/Ops Engineer building ML systems for AI - powered learning at Preply. Productionizing machine learning with high reliability, performance, and observability in a hybrid environment.
Machine Learning Engineer developing advanced Deep Learning models for autonomous driving technology at Mobileye. Collaborating in a high - end algorithmic engineering team on critical computer vision challenges.
Machine Learning Engineer focusing on vulnerabilities and security of AI systems at Carnegie Mellon University. Collaborating with a team to build robust prototypes and provide solutions for government sponsors.