About the role

MLOps Engineer managing AI pipelines for computer vision models. Involves end-to-end model lifecycle streamlining in a hybrid work environment.

Responsibilities

Own the end-to-end ML pipeline for computer vision: data prep, training, evaluation, model packaging, artifact/version management, deployment, and monitoring (local GPU cluster + GCP).
Design and maintain containerized workflows for multi-GPU training and distributed workloads (e.g., PyTorch DDP, Ray, or similar).
Build and operate orchestration (e.g., Airflow/Argo/Kubeflow/Ray Jobs) for scheduled and on-demand pipelines across on-prem and cloud.
Implement and tune resource allocation strategies based on current and upcoming task queues (GPU/CPU/memory-aware scheduling; preemption/priority; autoscaling).
Introduce and integrate monitoring/telemetry for:
job health and failure analysis (retry, backoff, alerts),
data/feature drift and model performance (precision/recall, latency, throughput),
infra metrics (GPU utilization, memory, I/O, cost).
Harden GCP environments (permissions, networks, registries, storage) and optimize for reliability, performance, and cost (spot/managed instance groups, autoscaling).
Establish model governance: experiment tracking, model registry, promotion gates, rollbacks, and audit trails.
Standardize CI/CD for ML (data/feature pipelines, model builds, tests, and canary/blue-green rollouts).
Collaborate with CV researchers/engineers to productionize new models and improve training throughput & inference SLAs.
Continuously improve documentation: update existing pipeline docs and produce concise runbooks, diagrams, and “how-to” guides.

Requirements

Hands-on MLOps experience building and running ML pipelines at scale (preferably computer vision) across on-prem GPUs and a public cloud (GCP preferred).
Strong with Docker and Docker Compose in local and cloud environments; solid understanding of image build optimization and artifact caching.
GitLab CI/CD expertise (modular templates, YAML optimization, build/test stages for ML, environment promotion).
Proficiency with Python and Bash for pipeline tooling, glue code, and automation; Terraform for infra-as-code (GCP resources, IAM, networking, storage).
Experience with orchestration: one or more of Airflow, Argo Workflows, Kubeflow, Ray, or Prefect.
Experience operating GPU workloads: NVIDIA driver/CUDA stack, container runtimes, device plugins (k8s), multi-GPU training, utilization tuning.
Observability & monitoring for ML and infra: Prometheus/Grafana, OpenTelemetry/Loki (or similar) for metrics, logs, traces; alerting and SLOs.
Experiment tracking / model registry with tools like MLflow or Weights & Biases (runs, params, artifacts, metrics, registry/promotion).
Data versioning & validation: DVC/lakeFS (or similar), Great Expectations/whylogs, schema checks, drift detection.
Cloud services: GCP (Compute Engine, GKE or Autopilot, Cloud Run, Artifact Registry, Cloud Storage, Pub/Sub). Equivalent AWS/Azure experience is acceptable.
Security & compliance for ML stacks: secrets management, SBOM/image scanning, least-privilege IAM, network policies, artifact signing.
Solid understanding of containerized deployment patterns (blue-green/canary), rollout strategies, and rollback safety.

Benefits

Salary from **2,500 EUR to 5,500 EUR per month** (before Taxes)
A Birthday Gift
**After Probationary Period **
**Health Insurance**
**Health Recovery Days **(which can be taken as you need)
Paid **Study Leave**
Funding for the purchase of **Vision Glasses **after one (1) year of service

Hybrid MLOps Engineer

at Aerones

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

機械学習エンジニア

Match Group

Senior Scientist II, Applied Machine Learning, Translational Agentic AI

Tempus AI

Distinguished Software Engineer – AI/ML

Walmart

Machine Learning Ops Engineer

Stillfront Group

Senior Machine Learning Engineer – Hybrid

Zendesk

AI/Machine Learning Engineer – 3 Month Contract

AND Digital

Senior Machine Learning Engineer – ADAS

TomTom

Machine Learning Manager, Borrowing

Monzo Bank

Machine Learning Engineer

SiGMA World

Machine Learning Intern

Nomagic