Manager of Cloud Operations leading SRE practices to ensure reliability and scalability of cloud infrastructure on AWS and Azure. Join a growing team at Vendavo, enhancing customer success through efficient cloud operations.
Responsibilities
Lead, mentor, and develop a team of DevOps and SRE engineers.
Implement and promote SRE principles and practices across the organization.
Define and monitor service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs).
Develop and implement incident response and post-mortem processes.
Drive automation of operational tasks and infrastructure management.
Design, implement, and maintain scalable and resilient infrastructure on Azure and/or AWS.
Implement infrastructure-as-code (IaC) using tools like Terraform.
Ensure security and compliance of cloud environments.
Manage CI/CD pipelines for automated deployments.
Implement and maintain comprehensive monitoring and alerting systems. Utilize monitoring tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana, etc.
Communicate effectively with stakeholders at all levels.
Responsible for hiring the right team for the product
Requirements
Between 12 to 18 years of experience which includes leading SRE teams building highly scalable, secure, efficient, and resilient production systems in AWS and/or Azure.
Proven experience in implementing and managing SRE practices.
Strong understanding of CI/CD pipelines and automation tools.
Proficiency in infrastructure-as-code (IaC) tools (Terraform)
Experience with containerization and orchestration technologies (Docker, Kubernetes).
Strong understanding of networking concepts and protocols.
Experience with monitoring and logging tools (Azure Monitor, CloudWatch, Prometheus, Grafana, ELK stack).
Scripting and programming skills (Python, Bash, etc.).
Experience with various Databases (Oracle, SQLServer, etc.)
Benefits
Professional growth and Development opportunities.
Working within a team of friendly, skilled people where help is always within reach
Flexible working hours
4 recharge days, where the entire company goes on a brief pause in all geographies for 1 day each quarter. This day can be spent in whatever way helps you recharge, to regain energy, and dive back into the next workday
High-end laptop (Dell or Mac)
Competitive pay and bonus
18 vacation days in a year in addition to 15 days Sick Leave/ Casual leave per calendar year.
16 hours of paid volunteer time off per year
Wedding gift and newborn gift allowance for employees.
26 weeks of paid maternity leave and one week of paid paternity leave.
12 wellness leaves for women employees
Health Insurance of up to 7 lacs for self, spouse, 4 dependent children, and parents. 100% of the premium is paid by Vendavo and it covers the employee, spouse, children, and their parents.
Group Term Insurance coverage up to three times of their Annual CTC . Dependents are not covered.
Group Personal Accident coverage up to three times of Annual CTC. Dependents are not covered.
Instrument/Control SIS Reliability Engineer providing technical support for BASF's global engineering team. Delivering complex engineering solutions and ensuring adherence to technical standards and safety regulations across multiple projects.
Site Reliability Engineer working on Linux systems for observability platforms and logging. Design and maintain applications, support network visibility, and collaborate with teams.
DevOps Engineer working at White Circle, focusing on infrastructure for AI systems. Involves managing production environments, Kubernetes, CI/CD pipelines, and automation tools.
Airflow Reliability Engineer on the Customer Reliability Engineering team at Astronomer. Working with clients on optimizing their use of the managed Airflow service in a hybrid role in Hyderabad.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.