CloudOps SRE at Vendavo ensuring the reliability and efficiency of Cloud Services. Collaborate with development and operations teams to maintain AWS and Azure infrastructures.
Responsibilities
Deloy and maintainCloud infrastructure, ensuring optimal performance, security, and scalability
Develop automation solutions to streamline processes, such as creating scripts to run specific tasks on cloud systems
Monitor cloud environments, optimize resource usage, and implement performance improvements
Troubleshoot and resolve incidents, collaborating with development and delivery teams to minimize downtime and maintain service quality
Implement security measures to protect data and ensure compliance with organizational policies
Work closely with product development, delivery, and architecture teams to evaluate and implement new services based on requirements
Create and maintain technical documentation for Cloud infrastructure and Customer specific operation processes
**NOTE: the role will be working shifts (Morning, Mid, and Night) rotating every month for 24x7 coverage.**
Strong experience in deploying, and managing Azure or AWS infrastructure in a Production environment
Ability to work independently and as a team. Multi-task to a high degree
In-depth knowledge of AWS, Azure Cloud Infrastructure administration
Experience with infrastructure-as-code tools such as Terraform
Experience with monitoring and logging tools Prometheus, Grafana, and Graylog
Solid understanding of networking concepts and protocols (TCP/IP)
Strong scripting and automation skills (e.g., Bash, Python)
Experience with CI/CD tools like Jenkins or Azure Pipelines
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration skills to work effectively in cross-functional teams
Defining and implementing Service Level Indicators and Service Level Objectives
Building strong observability practices within end customers platforms
Create dashboards and configuring alerts to provide real-time visibility into system health
Experience in designing, analysing, and troubleshooting distributed systems
Knowledge of Linux/Unix fundamentals and TCP/IP networking
Excellent communication skills when dealing with both technical and non-technical stakeholders
**Preferred Qualifications/Skills:**
AWS/Azure certified
Experience with configuration management tools like Ansible and Chef
Benefits
Professional growth and Development opportunities.
Working within a team of friendly, skilled people where help is always within reach
Flexible working hours
4 recharge days, where the entire company goes on a brief pause in all geographies for 1 day each quarter. This day can be spent in whatever way helps you recharge, to regain energy, and dive back into the next workday
High-end laptop (Dell or Mac)
Competitive pay and bonus
18 vacation days in a year in addition to 15 days Sick Leave/ Casual leave per calendar year.
16 hours of paid volunteer time off per year
Wedding gift and newborn gift allowance for employees.
26 weeks of paid maternity leave and one week of paid paternity leave.
12 wellness leaves for women employees
Health Insurance of up to 7 lacs for self, spouse, 4 dependent children, and parents. 100% of the premium is paid by Vendavo and it covers the employee, spouse, children, and their parents.
Group Term Insurance coverage up to three times of their Annual CTC . Dependents are not covered.
Group Personal Accident coverage up to three times of Annual CTC. Dependents are not covered.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.