Senior ML Platform Engineer building and scaling machine learning infrastructure for AI applications. Responsible for LLM deployment, Kubernetes management, and mentoring engineering teams.
Responsibilities
Build and scale machine learning infrastructure focused on Large Language Models (LLMs) and AI applications
Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs
Architect and manage Kubernetes clusters for ML workloads
Ensure 99.9%+ uptime for ML platforms through robust monitoring
Mentor junior engineers and data scientists on platform best practices
Collaborate with data scientists and product engineering teams
Present technical solutions and platform roadmaps to leadership
Requirements
Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
5+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
2+ years of hands-on experience with machine learning infrastructure and deployment at scale
1+ years of experience working with Large Language Models and transformer architectures
Proficient in Python; strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Benefits
Comprehensive Total Rewards program that offers personalized coverage
Health insurance
401(k) savings plan vested from day one that offers a 6% match
Performance and recognition-based incentives
Tuition assistance
Workplace flexibility as well as GEICO Flex program allowing work from anywhere in the US for up to four weeks per year
Senior Staff Machine Learning Engineer leading technical architecture for GEICO's AI Agent Platform. Driving innovation and enhancing productivity for internal associates and customers.
Staff Machine Learning Engineer developing the next generation of AI Agent OS and SDKs for GEICO. Key responsibilities include architecting scalable systems and implementing observability frameworks.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.
Software Engineer for ML Infrastructure at Slack, architecting systems to support large scale AI deployment and reliability. Engage in deep systems engineering focusing on ML lifecycle and infrastructure scalability.