Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.
Responsibilities
Provide production support on a shift according to the team on-call roster.
Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
Monitor and Maintain Systems.
Respond to alerts and incidents promptly to ensure high availability.
Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Partner closely with DevOps and Software Engineering teams to enhance system reliability.
Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.
Requirements
Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
Expertise in Linux/Windows OS & networking
Advanced knowledge of Cloud services (AWS & Azure)
Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
Proficiency in Scripting - Python/Powershell / Bash
Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
Working experience on IaC Tools -Terraform/Ansible
Working experience on Configuration management -Chef
Working experience on Incident response - Pagerduty, Jira
Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more
DevOps Engineer managing Kubernetes deployments for health tech company. Collaborating with engineering teams to enhance healthcare services using advanced technologies.
DevOps Engineer at PointClickCare, empowering innovative healthcare with Kubernetes and automation expertise. Work remotely while supporting crucial healthcare technology solutions.
Entry Level DevOps Engineer at Podimo, building scalable cloud infrastructure for a podcast platform. Collaborate with development teams and leverage AI tools to enhance the platform.
DevOps Engineer managing AWS infrastructure while contributing to backend code in Node.js and Python. Join Auterion building AI - powered software for autonomous systems.
Cloud DevOps Engineer managing Azure infrastructure at Medical Guardian. Overseeing technical operations and security response in a hybrid work environment.
SRE Linux/Unix System Administrator at Broadridge with strong Unix/Linux Bourne/Bash Scripting skills. Collaborating in a hybrid, fast - paced environment to manage critical systems.
Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast - paced environment.
DevOps Engineer improving CI/CD pipelines and best practices for Datatonic's AI and data projects. Collaborate with clients to enhance infrastructure and drive innovation in tech.
Senior/Principal DevOps Engineer developing robust CI/CD pipelines for ClubWPT Gold at a hypergrowth startup. Collaborate globally to revolutionize online gaming experiences while maintaining high technical standards.
DevOps Engineer responsible for the health, performance, and automation of gaming platform services. Focused on CI/CD pipelines, infrastructure services, and application monitoring.