Senior DevOps Engineer managing AWS and Azure cloud infrastructure for a startup SaaS company. Focused on CI/CD, system reliability, and security best practices.
Responsibilities
Design, implement, and own CI/CD pipelines across multiple services to streamline software development and deployment.
Maintain, optimize, and architect cloud infrastructure (AWS, Azure) to ensure scalability, security, reliability, and cost-effectiveness.
Automate infrastructure provisioning, monitoring, and management using Infrastructure as Code (Terraform, Ansible, etc.) with modular, reusable patterns.
Monitor and improve system performance, troubleshoot production issues, and ensure high availability and reliability across environments.
Collaborate with software engineers to enhance deployment strategies and build internal tooling that improves development workflows.
Implement security best practices across infrastructure, networking, identity, and access.
Own or support security and compliance requirements, including SOC 2 controls, documentation, and evidence collection.
Manage and enhance containerization and orchestration tools (experience with any modern orchestration platform; ECS, Kubernetes, or similar).
Optimize logging, monitoring, and alerting systems (ELK stack, Datadog, etc.) to improve visibility and accelerate incident response.
Build and maintain observability tooling, including metrics, logging, and tracing (OpenTelemetry experience is a strong plus).
Optimize cloud resource usage and implement cost-efficient infrastructure practices.
Stay current with the latest DevOps best practices, tools, and industry standards.
Requirements
6–8+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering.
Strong proficiency in cloud platforms (AWS required; Azure/GCP a plus) and cloud-native architectures.
Experience designing CI/CD pipelines and deployment workflows end-to-end.
Proficiency with Infrastructure as Code tools (Terraform preferred; CloudFormation or Ansible also welcome).
Strong software development skills, with the ability to write clean, maintainable code in a modern programming language.
Hands-on experience with containerization and orchestration (experience with any major platform; ECS, Kubernetes, or similar).
Strong understanding of security best practices, IAM, networking, system administration, and distributed systems.
Experience supporting or contributing to security and compliance programs, ideally SOC 2 or similar.
Proficiency with monitoring and observability tooling (ELK, Datadog, Prometheus, OpenTelemetry, etc.).
Strong understanding of cloud cost optimization strategies.
Comfortable operating autonomously in an agile, fast-paced startup environment with a high degree of ownership.
Benefits
Medical, Dental, Vision, STD and Life insurance (100% Company-paid for the Employee)
DevOps Engineer for designing and maintaining Azure - based hybrid cloud infrastructure for a company specializing in nature - based smart city solutions. Leading cloud architecture and mentoring engineers as part of a high - impact team.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.