Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast-paced environment.
Responsibilities
Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks
Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data
Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres
Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured
Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation
Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers
Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution
Maintain efficient and predictable resource usage across compute, networking and storage
Support security and compliance work including patching, audit readiness and vulnerability management
Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication
Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers
Requirements
5+ years of professional engineering experience in a B2B, SaaS company
Strong experience operating production systems in cloud environments, ideally AWS
Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres
Solid understanding of observability tooling, performance debugging and system behavior under load
A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude
Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment
Benefits
Enjoy unlimited PTO
Fully covered health insurance (medical, dental, and vision)
DevOps Engineer responsible for web application operations and developer experience at Nitrado, a global game server hosting provider. Collaborating with developers on automation, Kubernetes, and Docker management.
Site Reliability Engineer at bsport scaling infrastructure and streamlining deployment processes. Responsible for managing reliability and CI/CD pipelines in a hybrid work environment.
Senior DevOps/Infra Engineer collaborating with top digital entertainment companies on impactful projects. Offering a blend of freelance flexibility and traditional employment security in Stockholm.
Senior Database Reliability Engineer enhancing MongoDB and PostgreSQL deployments at SS&C, a leader in financial services technology. Collaborating with teams to ensure operational reliability and mentor junior engineers.
DevOps Engineer at Smile enhancing performance and security for digital transformation projects. Collaborating on end - to - end solutions and driving operational efficiency in a digital environment.
DevOps Engineer managing automation lifecycle and technical infrastructure support for gaming company. Collaborating with IT Operations and business units to streamline issue resolution and enhance service quality.
DevSecOps Engineer responsible for CI/CD pipeline design, infrastructure automation, and ensuring operational reliability in a fast - growing AI startup.
DevOps Engineer defining DevOps strategies and collaborating with teams at Pacific Programming and Tech. Building infrastructure and processes for software solutions in a hybrid environment.
Senior DevOps Engineer managing Azure cloud infrastructure for AI solutions in healthcare. Architecting and maintaining multi - tenant Azure environments while ensuring compliance and security.
Senior DevOps Engineer at Leidos contributing to mission - critical programs for national security. Focusing on platform architecture, automation, and cloud infrastructure solutions.