DevOps & Kubernetes Engineer at AI software startup near Porto, managing Kubernetes clusters for ML workloads and collaborating on infrastructure solutions.
Responsibilities
Design, deploy, and manage production-grade Kubernetes clusters for ML and microservice workloads
Maintain and optimize container orchestration, including service mesh, network policies, and resource allocation
Oversee CI/CD pipelines using tools like GitHub Actions, GitLab CI, and Terraform
Manage Docker image lifecycle and enforce security best practices
Monitor infrastructure health using Prometheus, Grafana, and centralized logging solutions
Collaborate on Infrastructure as Code (IaC) and ensure scalable, reproducible deployments
Support GPU-based workloads and optimize GPU resource utilization for LLM agents
Maintain Linux-based cloud servers, implement security protocols, and manage DNS, VPNs, and firewalls
Troubleshoot Python microservices and contribute to automation and monitoring setups
Implement model serving and orchestration pipelines (MLflow, Kubeflow, etc.)
Ensure high availability and disaster recovery strategies across systems
Requirements
3-7+ years of advanced hands-on experience with Kubernetes administration, including networking, storage, and security
Proficient in Docker, multi-stage builds, and image lifecycle management
Strong Linux system administration skills (Ubuntu or RHEL-based systems)
Experience with cloud platforms, ideally Google Cloud Platform (GCP)
Solid understanding of CI/CD pipelines using tools like GitHub Actions, GitLab CI, or Jenkins
Familiarity with Infrastructure as Code (Terraform, Ansible) and GitOps workflows
Confident in managing GPU workloads and ML/LLM-serving infrastructure
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK stack
Comfortable with Python microservices and ML workflow troubleshooting
Fluent English skills at C1 or above
Benefits
Competitive Salary: Commensurate with your experience and contributions.
Flexible Work Setup: On-site collaboration in Porto, with the option for full remote work based on strong performance after onboarding.
Relocation Support: For your on-site onboarding or if you decide to move to Porto, you receive support with logistics, housing and onboarding connections to make it smooth.
Training & Growth Budget: Set aside for conferences, courses, and certifications.
Daily Meal Subsidy: Enjoy lunch on the company when working from the office.
Team Events: From BBQs to game nights and a Christmas party, with the first drinks on the house.
Onboarding Buddy: You won’t be left alone—get paired with someone who helps you ramp up quickly.
Mechanical/Reliability Engineer responsible for mechanical installations in Bergen op Zoom. Analyzing maintenance strategies and leading projects to enhance reliability.
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.