Software Engineer contributing to the observability team's development of visibility systems. Implementing a high-performance telemetry platform and supporting AI tools for engineering teams.
Responsibilities
Contribute to the development and maintenance of the systems that provide visibility into Shipt’s technical ecosystem.
Implement and support a high-performance telemetry platform for engineering teams to monitor, debug, and optimize their services effectively.
Work closely with senior engineers to ensure metrics, logs, and traces are reliable and actionable.
Help bridge the gap between traditional monitoring and intelligent diagnostics.
Utilize and support AI-enhanced tools and interfaces to streamline interaction with telemetry data.
Help integrate autonomous agents and predictive models into workflows for a proactive, self-healing infrastructure environment.
Requirements
Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).
3+ years of professional experience in software engineering, with exposure to observability, SRE, or infrastructure-focused roles.
Proficiency in at least one programming language, such as Golang or Python, for building and automating infrastructure tooling.
Familiarity with modern Log/Metric/Trace observability stacks- such as Prometheus, OpenTelemetry, structured logging, or similar observability stacks.
Basic understanding of how AI and machine learning can be applied to time-series data for anomaly detection.
Experience working with containerized environments like Kubernetes and cloud platforms like GCP.
Strong analytical and problem-solving skills, with a commitment to providing high-quality visibility for engineering teams.
Benefits
Employees (and eligible family members) are covered by medical, dental, vision and more.
Employees may enroll in our company’s 401k plan.
Employees will also be eligible to receive discretionary vacation for exempt team members.
Paid holidays throughout the calendar year and paid sick leave.
Other compensation includes eligibility for an annual bonus and the potential for restricted stock units based on role.
Entry - level DevOps Engineer at Nokia focusing on building and maintaining CI environment for LTE and 5G solutions. Engage with high - end telecommunication technologies and support development workflows.
AI Security Control Developer/Site Reliability Engineer for RBC's enterprise AI ecosystem. Design, implement, and validate security controls to protect AI systems with 24/7 reliability.
Senior Site Reliability Engineer ensuring scalability and reliability for NGINX systems and SaaS platforms. Collaborating across teams to drive automation and system performance.
Site Reliability Engineer ensuring reliability and performance of data platform services for Veepee. Collaborating on cloud migration, Kubernetes operations, and observability best practices.
Senior Lead Site Reliability Engineer overseeing critical systems stability and incident management. Leading Java applications reliability and supporting a dynamic technology environment.
Infrastructure Architect connecting clients and Kyndryl. Leading projects from start to finish, ensuring technical solutions meet client needs at Kyndryl.
DevOps Engineer automating and configuring network monitoring and automation solutions for Telia’s telecom operations in Finland. Ensuring performance, resilience, and high observability of critical platforms.
Client Services Consultant specializing in DevOps Mainframe Operations with experience in automation best practices. Analyzing Life Cycle Management data needs and evaluating solutions for Endevor - related operations.
Senior AWS DevOps Engineer at LexisNexis shaping global CI/CD platform. Collaborating with teams to deliver secure, reliable, and scalable delivery pipelines.
Cloud Engineer at MetroStar focusing on building and securing cloud - native systems. Managing Kubernetes workloads and CI/CD pipelines in Agile teams with an emphasis on security.