Senior Machine Learning Engineer at Disney designing machine learning models for self-healing infrastructure. Collaborating with cross-functional teams to enhance enterprise technology strategies.
Responsibilities
Work alongside our first-class applications, infrastructure & operations teams to understand current manual processes and business requirements
Architect, design, and implement reusable machine learning frameworks, patterns, and services that integrate into the enterprise automation and observability platforms
Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure
Build near-real-time inference pipelines that generate actionable insights from live telemetry, including continuous streams of metrics, logs, traces, and operational events
Create data abstractions and perform feature engineering on high-volume, high-cardinality telemetry data
Evaluate model performance using real production signals and continuously iterate to improve accuracy and reliability
Build closed-loop, event-driven systems where model signals trigger automated remediation actions
Partner with infrastructure and SRE teams to identify opportunities and integrate machine learning and AI-driven insights into operational tools, workflows, and dashboards
Analyze incident and historical data to uncover leading indicators and predictive signals
Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining
Breakdown targeted, manual processes into reusable software modules that leverage machine learning models
Build emulation and simulation environments (digital twins) of the infrastructure to test AI/ML-driven automation under realistic scenarios and allow for faster ideation and iteration for architects and engineers.
Develop algorithms and frameworks to integrate machine learning and AI technologies into our orchestration platform
Ensure service reliability, performance, and operational uptime through code-driven solutions.
Conduct root cause analysis, design fault-tolerant architectures, and enable self-healing automation.
Implement monitoring dashboards and KPIs to provide visibility into automation and tooling performance.
Collaborate with cross-functional teams including network engineers, software developers, machine learning engineers, and operations teams across the enterprise.
Support the integration of commercial and open-source tools while maintaining a vendor-agnostic implementation.
Requirements
7+ years of software engineering experience, with expertise in automation, machine learning, and AI technologies
Proven hands-on experience building production-grade ML models and inference pipelines; strong proficiency with modern ML frameworks such as PyTorch, TensorFlow, Scikit-learn, etc.
Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure
Proven hands-on experience using software to build frontend, APIs, and backend functionality; strong proficiency with Python, JavaScript, TypeScript, Go, or Rust
Strong hands-on experience building and deploying event-driven or streaming data, machine learning models in production
Solid foundation in statistics, data analysis, and applied machine learning techniques
Experience working with large-scale, real-world datasets (noisy, incomplete, non-standardized, and evolving)
Experience operationalizing models in distributed, production environments
Ability to translate ambiguous operational problems into solvable machine learning use cases
Experience with modern cloud platforms, container orchestration (Kubernetes/Docker), identity/auth frameworks, data and workflow orchestration.
Experience with AI/ML technologies and data engineering concepts.
Preferred: Proven hands-on building AI agents.
Preferred: Certifications such as Kubernetes (CKA/CKAD), AWS/Azure/GCP certifications, CCNP/DevNet or NVIDIA AI engineer.
Preferred: Experience developing low-code/no-code automation platforms or reusable developer toolkits.
Benefits
A bonus and/or long-term incentive units may be provided as part of the compensation package
Full range of medical, financial, and/or other benefits, dependent on the level and position offered
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.
Software Engineer for ML Infrastructure at Slack, architecting systems to support large scale AI deployment and reliability. Engage in deep systems engineering focusing on ML lifecycle and infrastructure scalability.
Machine Learning Engineer at Winnow developing AI solutions for food waste reduction. Collaborate with cross - functional teams and leverage cutting - edge technologies in food recognition.
Senior Engineer developing AI/ML solutions to enhance patient care at Edwards Lifesciences. Collaborating with cross - functional teams to deliver impactful technologies in healthcare.
Machine Learning Engineer designing and deploying machine learning models for DXC Technology. Collaborating with data scientists and optimizing solutions for impactful results.
Senior Machine Learning Engineer at APS leading MLOps initiatives and collaborating across teams. Designing and implementing scalable machine learning solutions with a focus on real - time decision - making.