Production Support & Monitoring Engineer ensuring reliability, performance, and availability for Exegy's production systems. Collaborating with teams to resolve incidents and optimize environments.
Responsibilities
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience
2+ years of experience in production support, system administration, or monitoring role
Strong technical skills in Linux/Unix environments, with experience in troubleshooting and debugging
Hands-on experience with monitoring tools (e.g., ITRS, Prometheus, Grafana, Splunk) and incident management platforms
Scripting experience (e.g., Python, Bash) to automate monitoring and reporting tasks
Excellent problem-solving and analytical skills, with the ability to work under pressure in a fast-paced environment
Solid understanding of networking, system performance, and application monitoring concepts
Exceptional communication and collaboration skills to coordinate with cross-functional teams effectively
Benefits
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Reverse Engineer at Teller building APIs for connecting apps to users' financial accounts. Help crack mobile banking applications for seamless bank integrations.
Project Engineer supporting construction project teams at Fessler & Bowman. Assisting with project planning, scheduling, and management across multiple construction sites.
Lead Engineer developing AI - powered features for FIS’s cloud - based financial platform, collaborating with teams and mentoring junior engineers for architectural excellence.
Controls Engineer designing and maintaining control systems for manufacturing equipment. Involved in troubleshooting and onsite servicing for optimal operations.
Tier III VTC Engineer providing technical expertise for AT&T at customer site in Virginia. Responsible for video teleconferencing troubleshooting, installation, and design at various locations.
Lead Knowledge Engineer at S&P Global driving data transformation initiatives. Collaborating with technology teams to implement next - generation data architecture and knowledge management solutions.
Part 21 Electrical / Avionics Engineer at Boeing responsible for compliance with regulatory requirements. Supporting certification of modifications for global airline partners and collaborating with engineering teams.
Engineer designing, developing, and testing nuclear equipment and systems for Navy ships at Newport News Shipbuilding. Collaborating on safety, efficiency, and performance improvements while conducting relevant research and analysis.
Senior Forward Deployed Engineer embedding in strategic aviation operations to drive measurable impact. Working with airlines and MROs while ensuring successful adoption of AI - driven solutions and product enhancements.
Senior Geotechnical Engineer providing technical leadership and developing engineering solutions for mining projects. Collaborating with teams to ensure compliance and excellence in geotechnical engineering.