Software Engineer for ML Infrastructure at Slack, architecting systems to support large scale AI deployment and reliability. Engage in deep systems engineering focusing on ML lifecycle and infrastructure scalability.
Responsibilities
Design, build, and operate systems to train, serve, and deploy machine learning models at scale, with a focus on reliability, performance, and operational simplicity
Evolve GPU backed inference infrastructure to support high throughput, latency sensitive workloads, including large scale model serving
Architect and optimize distributed training and data processing systems using platforms such as Ray, Airflow, Spark, or similar technologies
Build and maintain Kubernetes based platforms and orchestration layers using tools such as KubeRay, vLLM, and internally developed services
Architect solutions that bridge legacy systems with modern technologies while maintaining monolithic application stability
Develop robust monitoring, observability, and alerting for production ML workloads to ensure operational excellence
Partner closely with AI Platform, ML modeling, security, and product engineering teams to design infrastructure that supports evolving AI use cases
Provide technical leadership through design reviews, mentorship, and by setting engineering standards and long term architectural direction for ML infrastructure
Author technical design and architecture documentation, and contribute thought leadership through engineering blog posts
Requirements
Significant professional experience in software engineering with a strong focus on infrastructure, backend systems, platform engineering, or MLOps
Deep experience building and operating distributed systems, including expert level knowledge of Kubernetes and container based platforms
Hands on experience with modern ML infrastructure and serving stacks such as Ray or KubeRay, vLLM, or similar training and inference orchestration frameworks
Experience working with GPU infrastructure, including performance optimization and operational management at scale
Strong experience with data infrastructure and orchestration technologies such as Airflow, Spark, or similar systems
Experience building and operating cloud native systems on public cloud platforms such as AWS, GCP, or Azure, including infrastructure as code
A demonstrated ability to drive technical direction for complex systems and balance short term delivery with long term architectural goals
Excellent written communication, as well as ability to thrive in an asynchronous and globally distributed infrastructure team.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.
Machine Learning Engineer at Winnow developing AI solutions for food waste reduction. Collaborate with cross - functional teams and leverage cutting - edge technologies in food recognition.
Senior Engineer developing AI/ML solutions to enhance patient care at Edwards Lifesciences. Collaborating with cross - functional teams to deliver impactful technologies in healthcare.
Machine Learning Engineer designing and deploying machine learning models for DXC Technology. Collaborating with data scientists and optimizing solutions for impactful results.
Senior Machine Learning Engineer at APS leading MLOps initiatives and collaborating across teams. Designing and implementing scalable machine learning solutions with a focus on real - time decision - making.
Principal AI/ML Engineer leading AI initiatives for global enterprise clients at Symphony. Shaping solutions and architecting production - ready ML systems across diverse domains.
AI/GenAI - ML Engineer at Quento Technologies S.A. building and maintaining scalable and efficient machine learning pipelines for various AI applications.
Staff/Principal Machine Learning Engineer at Inworld optimizing real - time AI models and orchestration. Engaging in deep tech projects in a dynamic, collaborative environment.
Senior Machine Learning Engineer developing and integrating solutions for MoonPay's payments platform. Supporting fraud prevention and collaborating across teams in Fintech.
Lead Machine Learning Engineer at Disney Ad Platforms driving AI innovation and machine learning solutions for advertising. Innovating ad technology while mentoring junior engineers in a collaborative environment.