Senior Site Reliability Engineer improving the reliability of Acuity’s cloud services. Collaborating across teams to define observability standards and incident response in Cork Digital Centre of Excellence.
Responsibilities
Own the availability, reliability, and performance of Reflect’s production environments.
Define, track, and report on service health metrics including uptime, availability, and reliability indicators.
Drive root cause analysis (RCA), analyze system logs and ensure corrective and preventative actions are implemented.
Part of a global team providing operational & escalation coverage, leading incident response and recovery for critical services.
Automate operational workflows to reduce manual toil and improve consistency.
Support and improve deployment processes for features, patches, and hotfixes while maintaining a strong security posture.
Create, maintain, and continuously improve runbooks and standard operating procedures (SOPs).
Design and evolve monitoring, alerting, and observability standards across the platform.
Build and maintain dashboards and alerts that provide clear, actionable insight into system health.
Enable engineers to embed reliability best practices into system design and delivery.
Requirements
5+ years of professional experience in software engineering, SRE, or a related role.
Strong hands‑on experience with Microsoft Azure, including services such as: AKS, Azure Monitor / Log Analytics, Key Vault, ACR, VNets, Managed Identity.
Deep experience with containerized and orchestrated environments (Docker, Kubernetes).
Proven experience operating and supporting production SaaS systems at scale.
A keen eye for detail and a knack for troubleshooting complex issues.
Excellent communication and collaboration skills, with the ability to work effectively across teams.
A passion for learning and a drive to stay up-to-date with emerging technologies.
Bachelor's degree in Computer Science or a related field.
DevOps Engineer designing, implementing CI/CD pipelines and supporting cloud - based solutions at eInfochips. Collaborating with QA and Engineering teams for release readiness.
DevOps Engineer III providing L3 support for Operations across Edge/on - prem and cloud environments. Building automations and handling incidents for customer deployments.
SRE leading reliability and operational excellence at a mortgage tech platform. Designing systems, tooling, and processes for managing Pylon's production systems in Palo Alto.
Senior Build & Release Engineer at GXO Logistics responsible for CI/CD solutions and build automation across various environments. Collaborating with teams for smooth software deployments and mentoring staff.
Azure Senior DevOps Engineer supporting critical cloud systems in the Azure Government Cloud environment. Leading CI/CD pipeline design and implementation with operational best practices.
Automation Engineer enhancing infrastructure and automating operations for client systems. Working in a complex environment oriented towards automation, security, and performance.
Graduate Reliability Engineer at GKN Aerospace enhancing operational excellence through data analysis and project participation within large structural assemblies.
Site Reliability Engineer at WRITER, ensuring 24/7 availability and performance of AI - powered workflows. Collaborating on scalable infrastructure solutions while impacting enterprise customer trust.
Engineer at Trading Technologies improving platform stability through coding and automation. Focus on building advanced monitoring tools for global trading operations.