Senior Software Engineer optimizing AI workloads using ML techniques at NVIDIA. Focus on performance optimization across large GPU and CPU clusters in AI systems.
Responsibilities
Design and implement resource allocation and combinatorial optimization techniques to optimize LLM models at datacenter scale.
Research, develop, and deploy AI/ML techniques to optimize large-scale Deep Learning training and inference on NVIDIA supercomputers and distributed systems.
Build and productionize ML-based tools for performance prediction and optimization.
Develop and deploy a scalable, reliable data curation pipeline capable of handling complex data types.
Collaborate across hardware and software teams to deliver valuable performance analysis insights.
Lead performance test planning, establish performance targets for new technologies and solutions.
Requirements
PhD or Master's degree in Computer Science, Software Engineering, or equivalent experience
4+ years of experience applying machine learning techniques to computer architecture and system optimization problems
Hands-on experience developing and deploying various learning algorithms (e.g., reinforcement learning, offline RL, supervised learning)
Proficiency in building and using ML models with leading frameworks such as PyTorch, TensorFlow, or JAX
Proven ability to apply GNNs/transformers-based optimization to PyTorch model graph and Kineto execution traces
Expertise combining knowledge of NVIDIA GPUs, the CUDA library, and deep learning frameworks (TensorFlow/PyTorch) with networking concepts
Strong programming capabilities in Python, Bash, and C++.
A collaborative teammate with effective communication and interpersonal abilities.
Software Developer at AMERICAN SYSTEMS creating mission - critical solutions for naval aviation. Collaborating with talented professionals to enhance national security and drive technological advancement.
Software Engineer/Developer at AMERICAN SYSTEMS conducting research in electronic data processing software design and development. Requires collaboration with engineers and software testing.
Intern assisting in software engineering at N5X, a complete energy trading platform in Brazil. Supporting team in developing and maintaining systems and APIs in a hybrid working model.
Software Engineer Intern at Notion building and shipping AI Native projects that drive valuable impact. Collaborating with teams to forge a path forward in technology innovation.
Senior Software Engineer at Galileo focusing on building Data and AI/ML products. Collaborating cross - functionally to enhance observability and reliability in GenAI applications.
Provide senior technical leadership across PEXA’s engineering landscape. Shape the technical direction and product outcomes in a world - first digital settlement platform.
Principal/Sr Principal Software Engineer at Northrop Grumman working on the Sentinel Program. Designs and develops software applications and systems while ensuring adherence to software standards.
Principal/Sr Principal Software Engineer at Northrop Grumman developing applications for Sentinel Program. Collaborating with multidisciplinary teams and ensuring software standards are met.
Principal Software Engineer designing and developing applications for the Sentinel Program at Northrop Grumman. Collaborates on software standards and resolves user needs through multidisciplinary research.