Senior DevOps Engineer optimizing cloud infrastructure for fast-growing digital classroom platform. Collaborate with teams to enhance applications' reliability, performance, and scalability while ensuring system stability.
Responsibilities
Analyze and optimize system reliability, performance, and resource utilization of cloud infrastructure
Develop and maintain automation scripts for deployment, monitoring, and maintenance tasks.
Implement infrastructure as code (IaC) to automate the provisioning and configuration of infrastructure components.
Design and implement monitoring solutions to proactively identify and address issues.
Participate in on-call rotations and respond to incidents to ensure system stability and performance.
Conduct capacity planning to anticipate future resource needs and optimize infrastructure scalability.
Define and track reliability metrics to measure and improve system performance.
Prepare and present reports on system reliability and performance.
Work closely with software development teams to influence and improve the reliability and scalability of applications.
Conduct post-incident reviews to identify root causes and implement preventive measures.
Troubleshoot complex issues in a production environment.
Requirements
7+ years of experience in a DevOps, SRE or similar role
Bachelor's degree in Computer Science, Information Technology, or a related field.
Relevant experience in software engineering, systems administration, or a related field.
Proficiency in programming languages (e.g. Python, Go, Ruby)
Strong scripting skills for automation tasks (e.g. Bash, Python)
Hands-on experience and in-depth knowledge of cloud platforms (e.g. Google Cloud, AWS) and container orchestration tools (e.g. Kubernetes), including adherence to best practices and resource optimisation
A proficient understanding of core networking concepts (e.g. TCP/IP, DNS, load balancing)
Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform) and/or configuration management tools (e.g. Ansible, Puppet, Chef)
Experience with infrastructure monitoring, logging and alerting tools (e.g. Datadog, Prometheus, Grafana, PagerDuty), and log analysis
Strong collaboration and communication skills to work effectively with cross-functional teams
Ability to analyze complex systems and troubleshoot issues effectively.
Benefits
A people-first employer that is on an inspiring mission to build the future of education while changing the lives of millions
High calibre and diverse team ranging from successful startup veterans, to Fortune 500 and big tech professionals
Continuous learning and development opportunities, including subsidised course fees, certifications, conferences, and free access to Udemy and more
A strong mission ; the satisfaction of knowing you’re not only helping modern day superheroes, aka teachers but also helping them shape the minds of future generations all across the globe
Happy customers; helping thousands of schools worldwide through the digital transformation of education for the 21st century.
One of the most popular and fastest-growing EdTech platforms worldwide.
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross - functional teams to ensure reliability and operational excellence.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.
Lead DevSecOps Software Developer at Leidos enhancing automation for air traffic operations. Collaborating on safety - critical systems within a hybrid work environment.