Site Reliability Engineer responsible for enhancing cloud infrastructure and deployment systems. Key role in scalability and operational efficiency at Hewlett Packard Enterprise.
Responsibilities
Enhance Infrastructure as Code (IAC) and enforce best practices.
Optimize cloud infrastructure for scalability, security, and cost-effectiveness.
Develop internal tools to support and streamline cloud platform operations.
Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins.
Address container image vulnerabilities and standardize remediation processes.
Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks.
Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools.
Troubleshoot complex production issues to ensure system reliability and customer satisfaction.
Fine-tune distributed systems such as Apache Kafka and Cassandra.
Collaborate with development, security, and operations teams to align infrastructure with application needs.
Requirements
Minimum of 10 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE)
Proficiency with Linux systems, especially Debian-based distributions
Strong experience with cloud platforms such as AWS and GCP
Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible
Solid programming skills in Python and/or Golang
Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE)
Experience with GitOps workflows
Proven track record in implementing and maintaining CI/CD pipelines
Strong background in security and familiarity with security programs
Experience with monitoring and logging tools (Prometheus, Grafana, ELK)
Knowledge of both relational (SQL) and non-relational databases
Excellent problem-solving and debugging skills with a strong sense of ownership
Experience managing distributed systems like Apache Kafka and Cassandra
Effective communicator and collaborative team player
DevOps Engineer developing and managing scalable AWS infrastructures for a PropTech startup. Collaborating within a growing tech team to achieve ambitious goals in the legal conveyancing space.
Senior DevOps Engineer leading the design and optimization of cloud infrastructure at Growth Acceleration Partners. Ensuring secure and cost - effective deployments within fast - paced product development environment.
Advanced Dev Ops Engineer optimizing infrastructure solutions for engineering teams at a consulting and technology services company. Ensuring secure and cost - effective deployments in a fast - paced environment.
Entry - level DevOps Engineer at Nokia focusing on building and maintaining CI environment for LTE and 5G solutions. Engage with high - end telecommunication technologies and support development workflows.
AI Security Control Developer/Site Reliability Engineer for RBC's enterprise AI ecosystem. Design, implement, and validate security controls to protect AI systems with 24/7 reliability.
Senior Site Reliability Engineer ensuring scalability and reliability for NGINX systems and SaaS platforms. Collaborating across teams to drive automation and system performance.
Site Reliability Engineer ensuring reliability and performance of data platform services for Veepee. Collaborating on cloud migration, Kubernetes operations, and observability best practices.
Senior Lead Site Reliability Engineer overseeing critical systems stability and incident management. Leading Java applications reliability and supporting a dynamic technology environment.
Infrastructure Architect connecting clients and Kyndryl. Leading projects from start to finish, ensuring technical solutions meet client needs at Kyndryl.
DevOps Engineer automating and configuring network monitoring and automation solutions for Telia’s telecom operations in Finland. Ensuring performance, resilience, and high observability of critical platforms.