Cloud DevOps Engineer focusing on automated infrastructure across cloud and on-prem environments. Managing Kubernetes, CI/CD pipelines, and collaborating with development and operations teams.
Responsibilities
Design, implement, and maintain CI/CD pipelines using Jenkins, GitLab CI, Argo CD, etc.
Automate infrastructure provisioning and configuration using Terraform, Ansible, and Helm.
Deploy, configure, and manage Kubernetes clusters in cloud and on-premises environments (EKS, AKS, GKE, Rancher, RKE2, k3s, OpenShift).
Enforce Kubernetes security best practices (RBAC, PodSecurity, secrets, network policies).
Monitor and tune Kubernetes workloads for performance and reliability.
Administer, operate, and troubleshoot distributed database systems (e.g., Cassandra, MongoDB, Cockroach DB, etcd) within Kubernetes, ensuring high availability, data consistency, and performance.
Ensure high availability, scalability, backup/recovery, and disaster recovery strategies for databases.
Implement observability stacks (Prometheus, Grafana, ELK, Zabbix, etc.) for infrastructure and applications.
Partner with dev teams to design scalable deployment patterns and troubleshoot pipeline/build/deploy issues.
Maintain detailed technical documentation for environments, playbooks, and architectural decisions.
Mentor peers and team members in DevOps tools, Kubernetes, and cloud-native practices.
Requirements
3+ years managing Kubernetes in production
Expertise with container tools (Docker, Podman) and orchestration (Kubernetes, Helm).
Strong CI/CD experience with GitLab, Jenkins, Argo CD, and GitOps workflows.
Proficient in Infrastructure as Code (Terraform, CloudFormation, Ansible).
Deep knowledge of managing distributed databases in Kubernetes including StatefulSets, PVCs, dynamic volume provisioning. Backup, recovery, scaling, and clustering techniques.
Cloud experience in on-prem, AWS, GCP, Azure or OpenStack; experience with hybrid/multi-cloud preferred.
Familiarity with service meshes and Kubernetes networking (Istio, Calico, Cilium).
Proficient in Bash, Python, or similar scripting languages.
Strong analytical and troubleshooting abilities across app, infra, and DB layers.
Clear communication and ability to collaborate across development, QA, security, and operations.
Self-motivated, detail-oriented, and comfortable in high-paced, on-call environments.
Excellent documentation habits and focus on operational excellence.
Familiarity with compliance standards (FIPS, FedRAMP, FISMA).
Certifications in Kubernetes (CKA/CKAD), AWS/GCP, or Terraform.
Benefits
Competitive Salary & Incentives: We offer a competitive compensation package with and pre-IPO equity to reward your hard work and dedication.
Health & Wellness: Comprehensive medical, dental, and vision insurance plans to ensure you and your family stay healthy and covered.
Paid Time Off (PTO): Enjoy a generous PTO policy that includes vacation days, sick leave, and paid holidays to recharge and take care of personal matters.
Flexible Work Environment: We understand the importance of work-life balance. Enjoy the flexibility of remote work, and hybrid option to create the work schedule that works best for you.
Professional Development: We believe in continuous learning. Access to training, certifications, and educational resources to help you grow in your career and stay ahead of industry trends.
Employee Recognition: We celebrate achievements both big and small, with regular recognition programs and awards that highlight your contributions to our collective success.
Collaborative Culture: Be part of a dynamic, inclusive, and supportive team where innovation and collaboration are at the heart of everything we do.
Parental Leave: Generous parental leave policies to support you during life's important moments.
Maintenance Reliability Engineer specializing in various automated electrical/mechanical components at Northrop Grumman. Supporting manufacturing operations in Magna, Utah, for optimal equipment performance.
Senior Systems Operations Engineer supporting Payments Modernization at Wells Fargo. Managing systems operations and ensuring resilience and observability in payment platforms.
Database Reliability Engineer managing PostgreSQL infrastructure that underpins transactions at Nodal Exchange. Ensuring data integrity and performance in a regulated financial environment.
Senior Information Security Analyst responsible for integrating security practices in development. Join Panvel’s team focusing on securing applications and infrastructure.
DevOps Engineer leading the automation and adoption of DevOps best practices. Collaborating with teams to enhance agile delivery in cloud environments.
Senior Backend Engineer designing and developing backend services in Rust for Mobile DevOps. Collaborating on the Employee Superapp and implementing digital wallet services.
AI Development Operations Engineer responsible for the internal AI infrastructure empowering developers. Integrating AI systems into engineering workflows for efficient software design and maintenance.
Reliability Engineer responsible for availability and performance of U.S. Air Force Cloud services. Collaborates with teams to deliver reliable mission - critical systems in a hybrid environment.
Entry - level DevOps Engineer assisting in cloud infrastructure automation for AI - powered security operations platform. Seeking passionate candidates with foundational knowledge in Terraform, Kubernetes, and CI/CD pipelines.
DevSecOps Engineer responsible for security in CI/CD pipelines for a global client network. Collaborating on security hardening of applications and automation processes.