HPC scheduler/resource manager engineer crafting scheduling strategies for large datacenter clusters. Driving cutting-edge innovations in AI and GPU computing with top scientific partners and technologies.
Responsibilities
Provide engineering solutions and prototypes to enable efficient resource management and job scheduling for large scale clusters
Drive next generation requirements and features for schedulers in at scale clusters
Ensure technical relationships with internal and external engineering teams
Assist system architects and machine learning/deep learning engineers in building creative solutions based on NVIDIA technology
Be an internal reference for scheduling and resource management concepts and methodologies among the NVIDIA technical community
Test, evaluate, and benchmark new technologies and products and work with vendors, partners and peers to improve functionality and optimize performance
Requirements
BS, MS, or PhD in Engineering, Mathematics, Physics, Computer Science, or equivalent experience
12+ years of experience designing and running scheduling and resource management systems in large datacenter/AI/HPC solutions
Knowledge and experience with resource management / scheduling code bases: SLURM preferred, other implementations (LSF, SGE, Torque...)
Proven understanding of performance clusters, infrastructure and workload patterns
Experience using and installing Linux-based server platforms
Aerospace Engineer employing expertise to manage ICBM programs at Booz Allen. Ensuring the safety of next - generation nuclear weapon systems through risk assessments and technology evaluation.
Principal Electrical Controls Engineer leading control system development for data centers. Focus on automation, reliability, and integration of critical infrastructure systems.
Highways Engineer role focused on highway and drainage project delivery for Mott MacDonald. Collaborating with cross - functional teams to develop compliance - driven design solutions.
HyCO Plant Process Engineer providing process design and technical support to Southeast Asia plants. Maintaining efficiency and reliability while driving productivity and safety initiatives.
Services Engineer providing technical support and ensuring customer satisfaction in NYC metro area. Solving technical issues and managing service orders for Technogym products.
Services Engineer responsible for customer satisfaction and equipment care at Technogym. Executing service operations and maintaining technical standards in San Francisco area.
Services Engineer in the Austin metro area for Technogym, focusing on customer satisfaction and equipment service duties. Resolve customer requests and manage service orders effectively.
Senior - level Manufacturing Engineer at Arjo developing processes for reprocessing medical devices. Implementing improvements and managing technology projects in a state - of - the - art facility in Everett, WA.
Process Engineer developing manufacturing processes for energetic products at Northrop Grumman. Involved in troubleshooting and process improvement for energetic production.