Senior SRE Engineer ensuring reliability and performance of AI products at Plaud. Designing scalable systems and leading incident response to improve operational maturity.
Responsibilities
Ensure reliability and performance of Plaud.ai’s AI products at scale
Design and operate highly available, scalable cloud-native systems for AI workloads
Own production reliability, incident response, and on-call practices
Build observability (metrics, logs, tracing) and reliability automation
Define and manage SLOs, SLIs, and error budgets with engineering teams
Drive postmortems and reliability improvements across the platform
Lead incident response and continuous reliability improvement
Partner with product and engineering teams on reliability design
Improve observability and operational maturity
Requirements
8+ years in SRE, Infra, or Platform Engineering roles
Strong experience with cloud platforms (AWS/GCP/Azure)
Hands-on with Kubernetes and distributed systems
Experience in on-call rotation and incident management
Proficient in at least one programming language (Go, Python, Java)
Benefits
Meaningful Ownership An Employee Stock Ownership Plan (ESOP) that gives a real stake in Plaud’s long-term success.
High-Impact Environment Work in a fast-moving, product-driven environment where your ideas directly shape the future of AI productivity.
Comprehensive Health & Retirement Benefits Top-tier medical, dental, and vision insurance for employees and dependents, supported by a generous employer subsidy, plus a 401(k) retirement plan with company matching for full-time employees.
Time Off & Workplace Benefits Unlimited PTO, plus 13 paid holidays, 12 weeks of fully paid parental leave for all parents, a hybrid work model with a minimum of three in-office days per week, and access to high-quality office snacks, drinks, and equipment.
Cutting-Edge AI Tools for Productivity Access to best-in-class AI tools, including Cursor, GPT models, Gemini, Claude, and other frontier AI systems to maximize engineering and execution efficiency.
Best-in-Class Equipment Choice of top-spec laptops, high-performance workstation setups, and cutting-edge Plaud devices for all new hires.
Team & Culture Annual company offsites, team events, and a culture that values craftsmanship, ownership, and velocity.
Senior Cloud Site Reliability Engineer responsible for daily operations of Solace Cloud services across cloud platforms. Ensuring reliability and efficiency in a hybrid work environment.
Senior DevOps Engineer at Parser focusing on deploying and maintaining cloud - based products with AWS. Collaborating across technical teams and ensuring robust solutions for business needs.
Safety and Reliability Engineer focusing on safety assessments and reliability evaluations at Collins Aerospace. Lead analyses and ensure designs meet certification standards.
Deployment Engineer responsible for client solution deployment and integration at ng - voice. Work includes planning, configuration, and operational efficiency tasks.
DevOps Engineer participating in structuring Terraform practices at EOLEN, a consulting firm in engineering and IT. Focused on Cloud, Data, Cybersécurité, software development and IT infrastructure.
DevOps Developer coordinating IT support and developing pipelines and delivery processes for Saab. Focused on collaboration, technical solutions, and communication to achieve high - quality results.
Senior Infrastructure Engineer focused on design automation and software infrastructure at Intel Foundry. Collaborating with development teams to improve reliability and velocities in engineering processes.
Site Reliability Engineer at Personio focusing on automated infrastructure and collaboration across engineering teams. Shape the future of HR technology with meaningful impact and ownership.