MLOps Engineer building and operating scalable ML training & serving infrastructure for Epidemic Sound’s music search, recommendation, and audio ML systems.
Responsibilities
Design, build, and maintain the core infrastructure that powers machine learning applications.
Streamline the entire ML lifecycle and implement next-generation technologies.
Build scalable infrastructure for training and serving machine learning models using Kubernetes (GKE).
Develop and optimize CI/CD pipelines to streamline ML application lifecycle from development to production.
Implement and manage robust ML monitoring and observability solutions to ensure production model reliability.
Collaborate with Machine Learning Engineers, Data Engineers, and product teams to integrate data pipelines and tools like Vertex AI and feature stores.
Work within a team of MLOps engineers inside a larger cross-functional group.
Requirements
Proven experience in MLOps, with a deep understanding of best practices like ML monitoring and CI/CD for machine learning.
Proficiency with Kubernetes in a production environment.
Hands-on experience with pipeline orchestration tools such as Vertex AI Pipelines, Kubeflow Pipelines, Flyte, or Metaflow.
Infrastructure as Code skills, particularly with Terraform.
Experience with cloud-native data processing services like Dataflow or Airflow.
Nice to have: Experience with Google Cloud Platform services like BigQuery and Google Cloud Storage.
Nice to have: Knowledge of advanced data engineering practices.
Nice to have: Familiarity with observability tools for production infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).
Nice to have: Experience with serverless inference frameworks such as Seldon Core.
Nice to have: Familiarity with Music Information Retrieval.
Senior Staff Machine Learning Engineer leading technical architecture for GEICO's AI Agent Platform. Driving innovation and enhancing productivity for internal associates and customers.
Staff Machine Learning Engineer developing the next generation of AI Agent OS and SDKs for GEICO. Key responsibilities include architecting scalable systems and implementing observability frameworks.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.
Software Engineer for ML Infrastructure at Slack, architecting systems to support large scale AI deployment and reliability. Engage in deep systems engineering focusing on ML lifecycle and infrastructure scalability.