DevOps Engineer designing, automating, and maintaining CI/CD pipelines across various environments at Age of Learning. Collaborating to ensure operational excellence and security in infrastructure.
Responsibilities
Design, implement, and maintain GitLab CI/CD pipelines for multi-platform builds including Linux, Windows, and macOS
Manage and optimize Kubernetes infrastructure across multiple production and development EKS clusters and on-premise clusters
Build and maintain Infrastructure as Code using Ansible, Cloudformation, Packer, and OpenTofu
Implement and maintain GitLab Runner infrastructure across multiple environments
Deploy and maintain monitoring solutions using DataDog for Kubernetes clusters and infrastructure health
Work with container orchestration using Docker, Kubernetes, and Helm charts
Implement security best practices including SAST scanning, vulnerability management, and zero-trust architecture
Collaborate with development teams to identify and resolve deployment bottlenecks
Build creative and optimized solutions to address infrastructure and automation requirements
Contribute to and follow best practices established by the company
Requirements
7+ years of hands-on DevOps engineering experience in production environments
Strong experience with GitLab CI/CD including pipeline design, GitLab Runner configuration, and cache optimization
5+ years of hands-on Kubernetes experience including EKS cluster management, Bare-Metal clusters, Helm charts, and GitOps workflows
Strong proficiency with Infrastructure as Code using OpenTofu with multiple providers (AWS, Vault, Okta, DataDog)
Experience with Packer for automated image building across multiple operating systems
Hands-on experience with Docker containerization and container orchestration
5+ years working with AWS services including S3, EC2, IAM, EKS, Transit Gateways, Cloudfront, etc
Proficiency in scripting and automation using Python, Bash/Shell, and TypeScript
Strong experience with HashiCorp Vault for secrets management and authentication
Experience with monitoring and observability tools, preferably DataDog
In-depth knowledge of Git workflows, branches, tags, hooks, and GitOps practices
Experience with VMware vSphere or similar virtualization platforms
Strong understanding of security best practices, including SAST, vulnerability scanning, and zero-trust architecture
Experience troubleshooting build and configuration issues on Linux, macOS, and Windows
Deep understanding and knowledge of Linux administration and management, with a focus on networking
Experience with Flux CD for GitOps-based continuous deployment
A security-first focused mindset
Excellent written and verbal communication skills
Benefits
90% of employee health and welfare benefits premiums & 65% of dependent benefits premiums
A 401(k) program with employer match
15 paid vacation days (increases to 20 days on your 3rd anniversary), 12 observed national paid holidays, 9 sick days, and 16 paid volunteer hours per year
Our flexible work culture means 2 or more days in the office (hybrid) or 100% fully remote options available for most positions
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.