Director of Platform and Infrastructure Engineering leading Katalon’s cloud-based infrastructure operations and CloudOps strategy. Ensuring high availability, reliability, and security for global users.
Responsibilities
Lead the architecture, design, and end-to-end management of Katalon’s cloud infrastructure (AWS/GCP), ensuring security, scalability, and high availability.
Drive CloudOps strategy and roadmap aligned with CTO direction; oversee provisioning, monitoring, observability, incident response, and disaster recovery.
Build and optimize CI/CD pipelines, improve deployment frequency, system reliability, and MTTR, and promote SRE best practices.
Manage cloud budgets, optimize hosting costs, reduce cloud impact on COGS, and identify cost-saving opportunities.
Lead and develop CloudOps/SRE/DevOps teams, building strong capabilities in automation, reliability, cost optimization, and operational excellence.
Collaborate with Engineering, Security and Finance to ensure platform performance, security readiness, and accurate cost planning.
Ensure compliance with cloud security standards and regulations, lead audits and remediation efforts, and maintain CloudOps policies, runbooks, and SLAs.
Requirements
Bachelor’s degree in Computer Science, Engineering, or related field; advanced degree (MBA/MSc) preferred.
10+ years of experience in CloudOps/DevOps/SRE or infrastructure engineering, including 5+ years in senior leadership roles.
Proven success leading cloud or platform operations teams in a SaaS/product environment, with experience managing distributed or global teams.
Strong expertise in AWS/GCP (Azure optional), cloud architecture, networking, security, CI/CD (e.g., Argo, GitHub CI/CD), and Infrastructure as Code (Terraform, CloudFormation).
Deep hands-on knowledge of containers and orchestration (Docker, Kubernetes), observability tools, and incident management practices.
Demonstrated ability to optimize cloud costs, partner with Finance on budgeting/forecasting, and improve operational efficiency.
Exceptional problem-solving, communication, stakeholder management, and executive influencing skills.
Ability to build and scale high-performing teams and drive operational excellence in fast-paced environments.
Nice to Have: Experience in regulated industries with a strong focus on security and compliance.
Familiarity with Agile, DevOps, and SRE best practices.
Experience with modern monitoring and logging tools (Prometheus, Grafana, ELK).
Benefits
Competitive Pay & Bonuses: We believe in rewarding great work! You'll receive an attractive salary package plus performance bonuses to help you meet your financial goals.
Your Health & Happiness Matter: Take care of yourself with our comprehensive health coverage, flexible work options, and generous time off. We understand that life happens outside of work too!
Location-Tailored Benefits: Enjoy a complete benefits package designed specifically for your country, giving you the best coverage where you live.
Everything You Need to Succeed: Work with top-of-the-line equipment and enjoy modern facilities, plus helpful allowances to support your work setup.
A Place Where You Belong: Join our worldwide family where we celebrate what makes each of us unique. Here, everyone has a voice and equal opportunities to shine.
Room to Grow & Thrive: Your success is our success! We foster a trust-based culture where you can develop your skills, take on new challenges, and be recognized for your achievements.
Data Transport Infrastructure Engineer at Leidos supporting U.S. Air Force Cloud One Architecture. Involves developing scalable cloud - native solutions and mentorship roles in a hybrid remote setting.
Principal Software Engineer on Walmart's AI Security team analyzing threats and implementing robust security architectures. Collaborate across domains and mentor on AI safety and secure engineering practices.
Data Center Infrastructure Architect designing scalable and resilient optical cabling for hyper - scale data centers. Implementing physical solutions and automating fiber mapping for efficiency.
Systems and Infrastructure Engineer managing technology infrastructure and providing DevOps support for system reliability. Collaborating with development teams to implement solutions and enhance system performance.
Infrastructure Engineer managing IT infrastructure projects and operational tasks for the MHRA. Collaborating with teams to ensure service stability and performance in the Digital and Technology group.
AI Infrastructure Engineer at Xsolla designing AI/ML solutions for multi - cloud infrastructure. Collaborating on automation workflows and observability systems for improved infrastructure management.
AI Infrastructure Engineer designing and implementing AI/ML solutions for infrastructure use cases at Xsolla. Collaborating with teams to enhance the security posture of infrastructure systems.
Cloud Infrastructure Engineer managing Azure environments and supporting cloud infrastructure processes in a credit market servicing organization. Collaborating with DevOps teams and ensuring compliance with security standards.
Cloud Infrastructure Architect managing AWS and Azure environments for fintech clients. Leading architectural governance and security compliance in a hybrid infrastructure setup.
Infrastructure Engineer responsible for managing GCP infrastructure and supporting cloud operations. Seeking skills in Terraform, Kubernetes, Ansible, and incident response in enterprise settings.