Software Engineer, SRE responsible for maintaining systems and increasing infrastructure reliability at Mercari. Collaborating across teams to deliver high-impact features with a focus on efficiency and support.
Responsibilities
Operate and Maintain shared components used by multiple teams in Mercari US, impacting overall production reliability
Define and Measure System Reliability Goals using SLO/SLIs
Continuously monitor capacity, performance, and cost of systems in both production and development
Build, run, and integrate software to improve the availability, scalability, latency, and efficiency of our system as a whole
Define, manage and run Incident Management Processes (on-call, incident response, postmortem)
Mentor junior engineers, lead code reviews, and actively contribute to architectural decisions and technical documentation.
Collaborate with cross-functional teams including product, engineering, and QA to deliver high-impact features and improvements.
Requirements
5+ years of experience working with and administrating
production DBMS/MySQL clusters
Cloud Native environments (GCP/AWS/Azure)
Docker and Kubernetes
5+ years of professional experience maintaining and operating infrastructure
Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).
Experience and Passion for optimizing performance of databases/networking/microservices
Strong programming expertise in any programming language
Excellent English communication skills, with the ability to collaborate effectively across functions and regions.
Demonstrated ability to mentor and guide junior engineers.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.