Support AI and DevOps platforms at Citi Finance, ensuring operational stability and effective incident resolution, while collaborating with engineering teams.
Responsibilities
Demonstrates a strong understanding of how application support contributes to the overall technology function and organizational objectives.
Assist with vendor relationship management, including coordination with offshore managed services.
Support efforts to improve service levels for end users by enhancing operational efficiencies and strengthening incident management, problem management, and knowledge-sharing practices.
Partner with development teams to guide improvements in application stability and supportability.
Contribute to frameworks for managing capacity, throughput, and latency.
Assist in defining and implementing application onboarding guidelines and standards.
Support team members by fostering a collaborative environment and encouraging skill development.
Participate in cost-reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training.
Participate in business review meetings to help align technology tools and strategies with business requirements.
Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program.
Perform other duties and functions as assigned.
Support platform leadership in defining the platform roadmap and partnering with engineering teams and business stakeholders.
Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills.
Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency.
Support the enterprise-wide observability strategy, including monitoring, logging, tracing, and alerting.
Maintain hands-on familiarity with platform architecture and services as needed for operational support.
Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed.
Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments.
Requirements
5–7 years of relevant experience in a hands-on technical or support leadership role.
Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long-term maintainability.
Experience working with senior stakeholders or technology partners.
Demonstrated experience supporting IT service improvements or platform stability initiatives.
Strong communication and presentation skills, with the ability to convey technical concepts clearly.
Experience supporting or contributing to technical roadmaps or operational workstreams.
Experience participating in resilience-related activities such as incident simulations, disaster recovery exercises, or stability testing.
Ability to collaborate with cross-functional support teams and technology groups.
Strong organizational and workload-planning skills.
Consistently demonstrates clear and concise written and verbal communication skills.
Ability to communicate appropriately with relevant stakeholders.
Working knowledge of Generative AI concepts preferred.
Experience with CI/CD and configuration management tools preferred.
Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
Experience writing or maintaining code in Java, Python, Go, or similar languages preferred.
Hands-on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.
Cloud DevOps Engineer managing Azure infrastructure at Medical Guardian. Overseeing technical operations and security response in a hybrid work environment.
SRE Linux/Unix System Administrator at Broadridge with strong Unix/Linux Bourne/Bash Scripting skills. Collaborating in a hybrid, fast - paced environment to manage critical systems.
Senior Site Reliability Engineer at Rootly embedding with teams to enhance service performance and reliability. Own CI/CD pipelines and drive capacity planning efforts in a fast - paced environment.
DevOps Engineer improving CI/CD pipelines and best practices for Datatonic's AI and data projects. Collaborate with clients to enhance infrastructure and drive innovation in tech.
Senior/Principal DevOps Engineer developing robust CI/CD pipelines for ClubWPT Gold at a hypergrowth startup. Collaborate globally to revolutionize online gaming experiences while maintaining high technical standards.
DevOps Engineer responsible for the health, performance, and automation of gaming platform services. Focused on CI/CD pipelines, infrastructure services, and application monitoring.
Senior Principal SRE at Northern Trust, ensuring reliability and performance of global systems. Leading observability and automation initiatives while collaborating across teams.
Site Reliability Engineer owning the internal developer platform reliability at e - conomic. Collaborating with a cross - functional DevEx team to enhance developer productivity in Copenhagen.