Production Support Engineer II providing support for business-critical systems while ensuring operational stability. Resolving incidents, maintaining system health, and collaborating with engineering teams.
Responsibilities
Provide day-to-day support for business-critical systems, ensuring operational stability.
Resolve lower to medium-priority incidents and maintain system health.
Support the improvement of production environments through collaboration with senior engineers and cross-functional teams.
Identify, troubleshoot, and resolve lower to medium-priority technical issues with guidance from senior engineers.
Support day-to-day monitoring of system performance and use monitoring tools to detect anomalies and take corrective actions.
Collaborate with cross-functional teams to resolve technical incidents and escalate higher-complexity issues to senior engineers as needed.
Assist in automating routine production support tasks by developing or modifying scripts and tools.
Maintain documentation for production issues, troubleshooting steps, and system configurations, contributing to the shared knowledge base.
Participate in incident, problem, and change management processes, following ITIL best practices.
Perform root cause analysis for recurring issues and assist senior engineers in implementing permanent fixes to improve system stability.
Support the implementation of process improvements to enhance system performance and minimize downtime.
Assist with mentoring and supporting junior-level engineers, providing guidance as needed.
Requirements
Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
Four to eight years of experience in production support, systems engineering, database engineering or related technical roles.
Experience with IT Service Management (ITSM) tools such as ServiceNow with solid understanding of incident, problem, and change management processes.
Proficiency in using monitoring tools like Splunk, Dynatrace, or CloudWatch to detect and resolve system performance issues.
Strong analytical and problem-solving skills, with the ability to assist in root cause analysis and incident resolution.
Ability to work independently on lower-to-medium priority incidents and escalate complex issues when necessary.
Experienced Production Engineer supporting quality - critical processes and collaborating with teams to ensure high - quality pen needles. Engaging in stable operations and improvements within a 2 - year temporary contract.
Production Support Engineer ensuring system stability and reliability for Manulife's critical services. Collaborative role bridging development and infrastructure, providing seamless service for customers.
Senior Production Engineer (SRE) at Legion building and operating a secure AWS/Kubernetes platform. Focused on automation, reliability, and infrastructure as code.
Production Engineer managing database operations at Palantir, ensuring reliability and availability of data systems. Involved in architecture, design, and maintenance of production databases in various environments.
Production Engineer PCB managing first - line technical support for PCB assembly processes. Assisting with product introduction and implementing process improvements in a leading transport solutions company.
Senior Production Support / DevOps Engineer at Keyrus focusing on application reliability and cloud operations. Support enterprise Java - based platforms in collaboration with development teams.
Lead Production Engineer managing production optimization initiatives across the enterprise for oil and gas. Act as the key authority in autonomous and semi‑autonomous production engineering standards.
Production Engineer in open pit mining at St Ives Gold Mine. Responsible for drill and blast designs aligning with production plans and continuous improvement.
Production Engineer ensuring compliance with manufacturing procedures and standards at Galderma. Optimizing production processes and supporting autonomous work cells for operational improvements.
Production Support Engineer ensuring reliability of Ruby on Rails platform at HHAeXchange. Supporting operational health and handling incident response for production systems.