Site Reliability Engineer ensuring performance, scalability, and security of production environments at FIS. Collaborating on resilient, self-service platforms for fintech solutions.
Responsibilities
Manage, tune and support enterprise environments across: AIX: LPARs, PowerVM/VIOS, NIM, storage, tuning Linux: RHEL, SUSE, Ubuntu—performance, security hardening, system services Windows Server: IIS administration, clustering, GPO, patching
Support and optimize: Oracle databases (RAC, Data Guard, RMAN, SQLNet/capacity tuning) IIS web infrastructure (app and thread pools, SSL/TLS, ARR, logs) Load balancers (F5 BIG‑IP, HAProxy—monitors, iRules/policies) Akamai CDN/WAF, caching, edge configuration
Ensure operational excellence in backup/restore, patching, DR, and capacity management
Provide enterprise application support (mission‑critical in‑house systems—performance, reliability, release operations)
Own, manage, and execute deployments across UAT, Production and DR. Maintain and optimize deployment runbooks, build artifacts, and environment promotion workflows.
Implement safe‑deployment strategies including blue/green, canary, rolling, and feature‑flag‑based releases.
Coordinate with development and DevOps teams to ensure deployment readiness, including configuration, dependencies, and release validation.
Troubleshoot deployment issues, manage rollbacks, and ensure post‑release stability.
Enhance and maintain CI/CD pipelines to improve deployment predictability, reliability, and auditability.
Integrate deployment telemetry into observability tools to detect release-related anomalies early.
Enforce deployment quality gates, configuration consistency, and compliance requirements.
Support continuous improvement of release processes, reducing manual steps and eliminating deployment toil.
Monitor system health and respond to incidents with a focus on rapid recovery and root‑cause analysis and long-term remediation
Administer and optimize monitoring and observability tools including Splunk, Dynatrace, BigPanda, Zabbix, SiteScope, Wireshark and Idera.
Build and maintain robust logging, metrics and tracing stacks. Develop dashboards, alerts, and automated remediation workflows. Drive post‑incident reviews and continuous improvement initiatives.
Conduct capacity planning and performance tuning across infrastructure and applications. Identify systemic issues and architect resilient solutions. Collaborate with engineering teams to optimize systems for reliability and performance.
Automate provisioning and operations using Ansible, PowerShell, Bash, and Python. Implement Infrastructure‑as‑Code using Terraform/Ansible Build internal self‑service tools to reduce manual work.
Administer IBM Tivoli Workload Scheduler (TWS): workload automation, job streams, monitoring.
Experience in high compliance environments (SOC2, HIPAA, FedRAMP, ISO27001). Partner with security teams to remediate vulnerabilities and ensure environment hardening.
Support compliance requirements through automation, logging, and operational controls Follow ITIL processes for change, incident, and problem management.
Partner with DevOps, application, database, and network teams. Maintain documentation, runbooks, diagrams, and standards. Contribute to release planning, environment readiness, and cross‑team coordination.
Requirements
10+ years of experience in SRE, DevOps, or platform engineering roles
Proven experience building resilient, scalable, and highly available systems
Expertise with AIX, IBM mainframe, Linux, Windows, Oracle, IIS, F5 load balancers, Akamai
Experience with Splunk, Dynatrace, BigPanda, Zabbix, SiteScope, Idera, and IBM TWS
Strong scripting and automation skills. (Python, Bash, PowerShell, Go)
Familiarity with common enterprise application architectures, including multi‑tier (web/app/db), service‑oriented architecture (SOA), microservices, message‑driven/event‑driven systems, API‑centric integration patterns, and distributed system design principles.
Translate complex technical concepts into clear, business‑friendly language and communicate expectations, risks, and solutions effectively with clients
Communicate effectively with clients at all levels—technical and non‑technical—building trust while understanding their goals, constraints, and success criteria, and proactively managing expectations through clear, timely, and transparent dialogue.
Commitment to continuous improvement.
Investigating issues across multi‑layered systems, identifying root causes and anticipating blockers before they occur.
Strong collaboration with cross functional teams (dev, ops, security, product).
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.
Senior Analyst on Data Platform DevOps at AIMCo, responsible for building data operations and collaborating with teams on innovative solutions. Focused on ensuring data quality and integrity across technologies.
Principal Engineer driving systemic reliability improvements for Xero's software products. Leading technical initiatives and mentoring teams in engineering excellence.
DevOps Engineer at Constantinople enhancing release processes for the AI - native banking platform. Collaborate across teams ensuring CI/CD pipeline reliability and operational efficiency in the APAC timezone.
DevOps Engineer in the US helping with digital transformation projects for international clients. Utilizing AWS, Terraform, and CI/CD tools in a global operations team.