About the role

Site Reliability Engineer focused on designing and maintaining observability solutions for fintech company. Collaborating across teams and automating infrastructure for global payment processing.

Responsibilities

Own OpenTelemetry pipelines: design, implement, and maintain observability pipelines across the three primary signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability.
Empower engineering teams: build self-service automation and tooling that enables development teams to instrument and use observability without manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry.
Support incident management: act as the engineering lead for our Incident Management Team by designing processes, playbooks, checklists, and automations for engineers to follow during incidents.
Collaborate across teams: work with members from nearly every team across the business to understand their monitoring, alerting, and SLO/SLA requirements, and design systems and processes that meet or exceed those requirements. Influence architectural decisions during initial design to ensure resilience and scalability from the outset.
Automate observability infrastructure: use Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and observability configurations across OTEL pipelines.
Define baseline observability standards: design baseline requirements for new and existing services to ensure all dLocal infrastructure and code are monitored consistently and accurately.
Own technical and security health: take full ownership of dLocal’s infrastructure reliability, ensuring adherence to key availability and security KPIs.
Optimize alerting systems: continuously refine alerting signals to minimize noise, ensure alerts are actionable, reduce fatigue, and improve response efficiency.

Requirements

4+ years of experience as an SRE or in a closely related observability-focused role.
Expertise in Kubernetes, including core components, deployment methods, and monitoring best practices.
Familiarity with OpenTelemetry, including configuring OTEL collectors, instrumentation, and pipeline optimization.
Proficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog.
Hands-on experience with IaC tools (Terraform) and GitOps/CI-CD solutions (Argo CD, GitHub Actions, or similar).
Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows.
Strong scripting skills (Python, Go, or similar) for automating observability tasks.
Problem-solving mindset with the ability to collaborate across cross-functional teams to drive reliability improvements.
Cloud experience, especially AWS and ECS-based workloads.
Experience managing observability pipelines at scale in high-throughput environments.
Familiarity with Configuration-as-Code tools (Ansible, Chef, or SaltStack) for managing configurations across legacy instances.
Database performance monitoring experience, particularly in large-scale distributed environments.

Benefits

Flexibility: flexible schedules driven by performance.
Fintech industry: work in a dynamic, fast-evolving environment with ample opportunity to build and innovate.
Referral bonus program: our employees are our best recruiters—refer a great candidate for a role and get rewarded.
Social budget: receive a monthly budget to spend with your team (in person or remotely) to strengthen connections.
dLocal Houses: rent a house to work remotely with your team for a week anywhere in the world—we’ve got you covered.
Flexible work approach: we focus on impact and productivity over fixed hours. Depending on your role and location, you’ll combine self-managed focused time with in-person collaboration in our hubs.

Hybrid SRE, Technical Referent

at dLocal

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

Junior DevOps Engineer

Alea

DevOps Engineer – Tech 4

Castalia Systems

DevOps Engineer, Tech 3

Castalia Systems

Senior Engineer, DevOps

Hex Trust

Cloud Release Engineer I

RELX

Lead DevSecOps Developer

Leidos

Platform DevOps Engineer

EEOC

Site Reliability Engineer II

Forcepoint

Senior DevOps Engineer – Internal Tooling, APIs

Genesys

Azure Security DevOps Engineer

Global Payments Inc.