Staff Site Reliability Engineer managing large-scale systems and ensuring infrastructure reliability for NordVPN's services. Collaborate on automating platforms and solving complex technical challenges.
Responsibilities
Deliver projects on time and oversee key projects
Collaborate closely with stakeholders and mentor colleagues
Ensure quality and reduce technical debt
Drive engineering excellence and protect solution quality
Help teams navigate data and design scalable infrastructure
Automate deployments and improve delivery speed
Troubleshoot and resolve critical issues in complex systems
Integrate AI into workflows to enhance delivery
Requirements
Observability experience with monitoring tools (OpenSearch, VictoriaMetrics, Prometheus, etc.)
Operating highly available SQL and NoSQL databases (MySQL, PostgreSQL, Cassandra, etc.)
Build meaningful data visualization dashboards (Grafana, OpenSearch Dashboards)
Alerting and anomaly detection experience
Proficiency in programming languages (Python, Go, Rust, C)
Strong knowledge of Linux systems (especially Debian-based)
Site Reliability Engineer working on Linux systems for observability platforms and logging. Design and maintain applications, support network visibility, and collaborate with teams.
DevOps Engineer working at White Circle, focusing on infrastructure for AI systems. Involves managing production environments, Kubernetes, CI/CD pipelines, and automation tools.
Airflow Reliability Engineer on the Customer Reliability Engineering team at Astronomer. Working with clients on optimizing their use of the managed Airflow service in a hybrid role in Hyderabad.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.