DevOps Engineer managing complex incidents and automations in L3 support for Everseen. Driving best practices and collaborating across teams in cutting-edge AI solutions.
Responsibilities
You will be part of the L3 support team for Operations across Edge/on‑prem and cloud, owning complex incidents end‑to‑end: triage, deep‑dive debugging, root‑cause analysis, remediation, and follow‑ups.
To reduce Ops toil, you will build targeted automations (Python, Bash, Ansible) and automate new and existing SOPs used by Operations.
You will execute safe deployments and upgrades via GitOps and IaC pipelines (Flux, Ansible, Terraform) on AKS and GKE—coordinating validation and rollback plans—and contribute to the maintenance of existing GitLab CI/CD pipelines together with the DevOps engineering teams.
You will design and continuously refine Alertmanager rules and standardize actionable Grafana dashboards with Operations, ensuring effective use of Prometheus metrics and logs (Grafana Alloy, Thanos).
Beyond day‑to‑day operations, you’ll apply deep DevOps, CI/CD, and infrastructure automation expertise, drive best practices, share knowledge through workshops and mentoring, write and maintain documentation and SOPs (Standard Operating Procedure), test infrastructure, and collaborate across teams to optimize systems and workflows.
Requirements
4+ years in DevOps-related roles with a strong focus on automation.
Proficient in DNS, routing, container communication, firewalls, reverse-proxying, load-balancing, edge to cloud communication and troubleshooting.
Strong system administration skills are required for deploying and troubleshooting OS level outages and Everseen’s containerized Edge application in customer network.
Extensive experience with Azure (or GCP), including fully automated infrastructure and deployment.
Experience with monitoring and optimizing cloud costs.
Proven experience in implementing and managing CI/CD pipelines (GitLab CI/CD preferred) and excellent knowledge of Git and associated workflows (e.g., Gitflow).
Proven experience with monitoring, logging, and alerting tools and stacks.
Excellent scripting skills in Bash and Python.
Advanced knowledge of Kubernetes and Openshift, including cluster management, orchestration and auto-scaling, deployments using Helm charts and GitOps.
Proven experience with microservices architecture and related deployment strategies.
Expertise with Terraform modules.
Deep experience with Ansible, including writing complex playbooks, roles, and using Ansible Vault for secrets management.
Strong understanding of DevSecOps principles and experience implementing security best practices within CI/CD pipelines.
Excellent presentation, oral, and written communication skills. Fluent business English is a requirement.
A passionate advocate for determining and delivering solutions with a high level of customer satisfaction.
Demonstrated interest in learning and a strong desire to expand knowledge in their respective field.
Capable of engaging in technical discussions with stakeholders and leading DevOps projects. Mentors and coaches team members.
Benefits
Everseen is committed to creating a safe environment for all employees and has a zero tolerance policy for bias and discrimination of any kind.
Our work environment is one without offensive, hostile, or intimidating conduct, whether verbal, written or physical, in nature.
Everseen will not tolerate prejudice or discrimination of any kind including without limitation, where based on aspects such as, race, colour, sex, gender, religion, age, family status, disability of any kind, sexual orientation.
UNIX DevOps Engineer managing AIX and Solaris server operations for a Swiss telecom company. Focusing on automation, optimization and 7x24h monitoring responsibilities across multiple locations.
Staff Site Reliability Engineer designing and building backend services for NordVPN. High - ownership role focusing on system architecture and operational excellence.
Senior Site Reliability Engineer managing VPN and DNS services to ensure performance and reliability. Collaborating with application teams to maintain security and quality across global infrastructure operations.
Senior Site Reliability Engineer managing globally distributed VPN and DNS services. Optimizing service performance and handling security posture in a hybrid work environment.
Senior Site Reliability Engineer focused on observability for NordVPN. Designing monitoring systems and collaborating with data teams on anomaly detection.
Senior Site Reliability Engineer ensuring content accessibility across global edge infrastructure for NordVPN. Designing and troubleshooting systems critical to internet traffic management.
Staff Site Reliability Engineer designing tools for Threat Protection Pro and NordLynx protocol. Working on globally distributed backend services for NordVPN with a focus on security and privacy.
Senior Site Reliability Engineer focused on observability for cybersecurity tools at NordVPN. Designing monitoring systems and collaborating on anomaly detection within distributed systems.
Senior Site Reliability Engineer focused on traffic engineering at NordVPN. Working to enhance the world's most advanced VPN and online security solutions.
Senior DevOps Engineer building infrastructure and tools at Metrc, LLC. Implementing change and improving processes in a fast - growing technology company.