Site Reliability Engineer ensuring reliable operation of payment platform at CellPoint Digital. Collaborates with teams to drive automation and reliability across global infrastructure.
Responsibilities
As an SRE at CellPoint Digital, you’ll be a key player in ensuring our payment platform runs reliably, securely, and at scale—processing thousands of payments per second
Working closely with our Product, Development, and Architecture teams, you’ll blend hands-on operational excellence with a software engineering mindset to drive automation, observability, and reliability across our global infrastructure
Requirements
Ensure the production environment runs smoothly, with a holistic view of system health
Build software and systems to manage infrastructure and applications
Drive improvements in reliability, quality, and delivery speed of our payment solutions
Measure and optimize system performance, always looking to innovate and get ahead of customer needs
Provide operational support and engineering expertise for large-scale, distributed systems
Collaborating with Product, Development, and Architecture to define and share SLAs, and improve system reliability
Partnering with our Release Manager to deploy and troubleshoot new versions of our platform and services
Participating in an on-call (Grafana IRM) rotation to respond to incidents impacting availability and supporting internal engineering teams
Preventing incidents through robust automation, monitoring, and proactive engineering
Running our modern stack: Google Cloud Platform, Kubernetes, Terraform, Github Actions, etc.
Designing, building, and maintaining core infrastructure that supports massive scale and high availability
Debugging production issues across services and infrastructure layers
Planning and executing infrastructure growth to meet future demand
Benefits
Competitive salary in a fast-growing start-up
Rewards & Recognition system
Opportunity for personal and professional growth in a dynamic industry
Work from anywhere in the world; we're a fully distributed company, and we provide the tools, culture, and support to make your work setup work for you
Occasional travel to Europe (UK, Copenhagen, Bulgaria)
Intern assisting with cloud infrastructure automation for educational technology company UOL EdTech. Collaborating with teams on database operations and cloud deployment tasks.
IT Infrastructure Coordinator leading teams in DevOps, Azure, and Office 365 for Grupo Iter's IT infrastructure management. Ensuring operational efficiency and technology evolution.
DevOps / Platform Engineer managing AI infrastructure and deployment pipelines for Simply.TV. Collaborating in a flat AI team structure to optimize platforms and performance improvements.
Site Reliability Engineer at Reward Gateway transforming operational workloads to an SRE approach. Collaborating with Product Engineering teams and advocating for observability and reliability.
DevOps Engineer configuring IaC - driven cloud environments (AWS, Azure) for leading software companies. Collaborating on innovative solutions and managing application deployments and monitoring programs.
DevOps Engineer focusing on hybrid and multi - cloud networking, Infrastructure as Code at Ness Digital Engineering. Collaborating with senior architects and engineers to improve scalable cloud environments.
Senior DevOps Engineer at Syncron designing and optimizing AWS platforms. Collaborating with development teams for better operational excellence and efficiency.
DevOps Engineer designing and optimizing cloud infrastructure and deployment pipelines at ECA International Group. Collaborating with engineering and operations teams to enhance system performance and reliability.
DevOps and Build Engineer for NVIDIA developing and maintaining CI/CD pipelines. Collaborating with teams to enhance compiler technologies and optimize build performance in a diverse environment.
Senior AWS DevOps Developer responsible for managing AWS infrastructure for enterprise public budgeting software at Euna Solutions. Collaborating on cloud projects and enhancing system reliability and performance.