Lead DevOps Engineer designing cloud infrastructure for ML/AI solutions in medical imaging. Collaborating across teams for scalable, secure platforms that optimize data operations.
Responsibilities
Partner with ML research, data engineering, and application teams to translate requirements into reliable, secure, and cost-effective platform capabilities.
Lead design reviews, RFCs, and proof-of-concepts; mentor team members on cloud, Kubernetes, and data best practices.
Own incident response for platform components and drive continuous improvement through automation and standards.
Design and implement secure, scalable, multi-cloud (GCP + AWS) configurations.
Establish and maintain infrastructure as code (IaC) standards with Terraform.
Lead cloud-to-cloud data migration including secure transfer planning, checksum/manifest validation, parallelization, and cutover strategy.
Implement robust ingestion pipelines for medical images and metadata into structured data stores with schema management, versioning, and data lineage.
Optimize storage tiers and caching strategies for high-throughput image workloads.
Establish cost observability with budgets, alerts, showback/chargeback, and automated idle resource cleanup.
Own permissions and access management across clouds.
Plan and execute winddown and exit from prior cloud providers: data egress, dependency mapping, app cutover, contract/savings plan termination, and archival with retention policies.
Stand up and maintain managed ML platforms (Vertex AI) or managed Kubernetes clusters (GKE/EKS) with CI/CD for pipelines, images, and deployments.
Partner with data/ML teams to codify data management practices: versioned datasets, reproducible preprocessing, clear lineage, and documentation.
Requirements
7+ years in DevOps/SRE/Platform roles, including multi-cloud (AWS/Azure/GCP) experience
Deep proficiency with Terraform, CI/CD (GitHub Actions/GitLab/CodeBuild/Cloud Build), and Kubernetes (EKS/GKE)
Hands-on experience with GPU workloads for ML training/inference and object storage patterns for large image datasets
Proven track record in data migration (cloud-to-cloud), structured data ingestion (e.g., BigQuery/Redshift/Postgres), and schema/governance
Site Reliability Engineer at Thales managing secure cloud environments on AWS and GCP. Ensuring compliance, security, and availability of critical cloud platforms with DevSecOps practices.
DevOps Intern at CCC Intelligent Solutions focusing on cloud infrastructure management and AI model deployment. Gain hands - on experience in DevOps and cloud automation with AWS and Azure.
Systemadministrator and DevOps - Engineer managing ongoing systems in web - based software development. Collaborating on infrastructure and supporting product development with a small team in Bremen.
DevOps Engineer collaborating with teams to ensure reliable software delivery. Focus on CI/CD workflows and platform services within a hybrid work environment.
DevSecOps Engineer embedding security controls into AI development workflows within a financial services environment. Focus on securing AI - generated code through CI/CD pipeline enhancements and SAST integration.
DevOps Engineer ensuring the stability, scalability, and security of systems in all environments at a leading fintech company. Focused on automation and CI/CD to enhance operational excellence and collaboration.
DevOps Engineer focused on SRE principles and AWS - centric infrastructure for VERBI Software. Managing reliability, scalability, and modernization with hands - on approach and technical mentorship.
Senior DevOps Engineer translating high - level requirements into scalable AWS architectures. Leading delivery of robust solutions and elevating engineering standards across the organization.