Distinguished AI/ML Engineer leading technical development of agentic AI systems for Walmart Global Tech. Ensuring system reliability and operational excellence with advanced AI solutions.
Responsibilities
As a Distinguished AI/ML Engineer within Walmart Global Tech’s Reliability Engineering Organization, you will lead the technical development of next-generation agentic AI systems and intelligent automation solutions that ensure mission-critical reliability, scalability, and operational excellence across Walmart’s entire technology ecosystem.
Architect and implement cutting-edge machine learning platforms and autonomous agents that transform how we manage change and performance, monitor, predict, and automatically resolve issues.
Design and implement multi-agent orchestration platforms that coordinate autonomous agents for change management, capacity planning, and performance optimization across e-commerce, supply chain, and in-store systems.
Develop self-healing infrastructure platforms that leverage AI to predict, prevent, and automatically remediate system issues.
Collaborate with engineering teams and leadership to reduce mean time to detect (MTTD) and mean time to restore (MTTR) through intelligent automation and predictive capabilities.
Requirements
Bachelor’s or Master’s degree in engineering, Computer Science, or a related field with 12+ years of hands-on experience in Reliability Engineering, AI/ML Engineering, or Platform Engineering.
Proven record as a senior individual contributor influencing architecture and driving technical excellence across large organizations.
Deep experience operating mission-critical systems, with expertise in MTTD, MTTR, availability, change management, model performance, and autonomous system reliability.
Expert-level AI/ML engineering experience, including deep learning frameworks such as TensorFlow and PyTorch and large-scale production ML deployments.
Advanced experience with agentic AI systems, including multi-agent frameworks, autonomous decision-making systems, LLM-based agents, and agent orchestration platforms.
Comprehensive Reliability Engineering expertise, including service management (Incident, Problem, and Change Management) and performance and capacity engineering for AI/ML systems.
Expert-level cloud engineering experience (Azure, GCP, AWS) with containerization (Kubernetes, Docker), serverless architectures, and cloud-native AI services.
Deep observability experience across distributed tracing, metrics, logs, APM, and AI-driven anomaly detection.
Strong platform engineering background including infrastructure as code, service mesh architectures, API gateways, and self-service developer platforms.
Benefits
Health benefits include medical, vision and dental coverage.
Financial benefits include 401(k), stock purchase and company-paid life insurance.
Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes.
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Machine Learning Engineer leading churn prediction model development for Architech. Collaborating with teams to optimize customer retention strategies and utilizing advanced AI and data analysis.
Automotive engineer developing solutions for electric mobility and autonomous vehicles at Expleo. Transforming mobility industry through innovation in vehicle safety, efficiency, and sustainability.
AI Developer for improving computer vision and machine learning models in sustainable agriculture. Working on innovative laser - based weed control technology at an Agritech startup.
Staff ML Engineer at Grindr developing scalable ML systems and enhancing user connections with emerging AI tools. Collaborating cross - functionally to drive innovation in the LGBTQ+ community.
Senior Staff ML Engineer at Grindr leveraging ML technologies to build systems enhancing user connections. Collaborating across teams to drive ML initiatives and architect recommendation systems at scale.
Machine Learning Engineer developing machine learning systems to enhance travel experience at Trainline. Collaborating in cross - functional teams to tackle complex real - world problems with data.
(Senior) Data / AI / ML Engineer at Building Radar developing scalable Data & AI solutions for the construction industry. Collaborating with engineers and designers to drive innovation through advanced technology.
Machine Learning Engineer designing VLM systems and video processing pipelines for TwelveLabs’ Video AI capabilities. Collaborating with researchers and engineering teams to enhance product solutions.
AI Engineer developing intelligent systems that power our expense management for a fintech company based in Berlin. Designing scalable AI infrastructure and delivering real - world finance solutions.