About the role

Director for SRE supporting Fidelity’s growing public cloud presence and delivering reliable runtimes for business critical workloads. Leading diverse technical teams to enhance cloud management capabilities and customer value.

Responsibilities

The Fidelity Enterprise Infrastructure (EI) Production Support team is seeking a Director to help scale our growing public cloud presence.
Fidelity’s Site Reliability Engineers work with our cloud platform teams to deliver reliable runtimes for Fidelity’s business critical workloads.
This team is responsible for cross-cutting cloud management capabilities and are the experts on the state of Fidelity’s cloud platforms at any moment.
The team comes from diverse technical backgrounds, and the responsibilities provide opportunity for a variety of challenges that require engineers to work on software and systems challenges.
Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience as an SRE.
The Director for SRE will support Engineering and Systems Operational support for Business Unit aligned functions including Application Support, Cloud Enablement, Helpdesk, Environment Management, Mid-tier & Web Operations, & Platform Engineering.
By demonstrating and promoting Fidelity and agile leadership behaviors, you will evolve and sustain an innovative agile culture.
Our ever-evolving technology stack ensures a phenomenal learning culture in the team.
We are always exploring new technologies and new ways to continually provide value to our customers.
This team has a direct and positive impact on Fidelity’s customers.

Requirements

Ability to automate with various scripting languages (Python, Shell scripting, etc.)
Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
Hands-on Kubernetes skills and knowledge.
Hands on experience with Cloud services on AWS and Azure
Experience on building resiliency with Chaos Engineering practices
Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
Proven experience in maintaining scalability and resiliency of complex environment.
Proven experience in implementing advanced observability practices and techniques at scale.
Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, Splunk)
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
Ability to triage, execute root cause analysis, and be decisive under pressure.
Experience managing and interpreting large datasets using query languages and visualization tools.
Proficient communication skills with an ability to reach both technical and non-technical audience.
Ability to learn new software, method and practices and bringing them to our developers.
Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships.
Bridges the gap between lofty architecture ideas and development of feasible solutions.
Facilitates discussions among component owners to improve end-to-end understanding of transaction paths.
Provides consulting to architects and developers on common patterns and tactical, reusable solutions.
Influences adoption of stability principles by presenting facts and data.
Drives operational readiness discussions and reviews of new solutions and products.
Develops frameworks for self-assessment of applications on various stability and dependability pillars.
Participates, even unsolicited, in discussions and decisions that impact customer experience.
Selectively preserves and shares collective memory and successes of past.
Mindset of continuous learning and experimentation.
Instinctive urge to improve current state by finding problems and recommending feasible solutions.

Benefits

Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office. This does not apply to Remote or fully Onsite roles.

Hybrid Director, Site Reliability Engineering

at skillventory - A Leading Talent Research Firm

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

Site Reliability Engineer

epay, a Euronet Worldwide Company

DevOps Engineer

White Circle

Senior Airflow Reliability Engineer

Astronomer

Principal Full Stack Engineer – SRE

skillventory - A Leading Talent Research Firm

DevOps Engineer – m/w/d

Cloudogu GmbH

Junior DevOps Engineer

Swift

Senior DevOps Engineer

Spring Health

Senior Site Reliability Engineer – Backup

Expleo Group

Performance & Reliability Engineer

Expleo Group

Senior Site Reliability Engineer – Storage

Expleo Group