Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Responsibilities
Leads a team responsible for enterprise observability platforms and core development tooling, enabling fast detection, diagnosis, and resolution of production issues.
Owns the reliability, security, scalability, performance, and instrumentation of CI/CD and developer platforms to improve system reliability and developer productivity at scale.
Partners with engineering, SRE/Operations, and Security to embed observability and operational excellence across the software delivery lifecycle.
Hire, coach, set priorities, and build a culture of reliability, ownership, learning, and continuous improvement.
Define vision/roadmap and standards for metrics, logs, traces, alerting, dashboards, and service health; promote early instrumentation and SLI/SLO-based practices.
Provide governance and strategic oversight for Azure DevOps, GitHub Enterprise, Jenkins, and related tooling; define guardrails for repos, branching, pipelines, build agents, and integrations.
Partner on incident response, RCA, and post-mortems; improve detection, triage, rollback, recovery, and on-call readiness.
Manage platforms via Infrastructure as Code; standardize configurations and operational practices; evaluate tools with an eye to capability, complexity, risk, and cost.
Ensure auditability, access controls/reviews, and logging meet enterprise requirements; optimize platform spend without degrading reliability or developer experience.
Convert technical information into business value and coordinate stakeholders across application, platform, and infrastructure teams.
Requirements
Experience leading engineering teams in Observability/SRE/Platform/DevOps Tools.
Hands-on background with observability platforms (e.g., Datadog, Splunk, New Relic, Grafana, OpenTelemetry).
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.
Senior Analyst on Data Platform DevOps at AIMCo, responsible for building data operations and collaborating with teams on innovative solutions. Focused on ensuring data quality and integrity across technologies.
Principal Engineer driving systemic reliability improvements for Xero's software products. Leading technical initiatives and mentoring teams in engineering excellence.
DevOps Engineer at Constantinople enhancing release processes for the AI - native banking platform. Collaborate across teams ensuring CI/CD pipeline reliability and operational efficiency in the APAC timezone.