Senior AI Software Engineer, LLM Inference Performance Analysis at NVIDIA | Hybrid Hired

About the role

Senior AI Software Engineer focused on optimizing LLM inference performance at NVIDIA. Collaborating with teams to assess bottlenecks and validate improvements to compiler and runtime efficiency.

Responsibilities

Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.

Requirements

Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years relevant experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.

Benefits

equity
benefits

Similar roles

Browse all Full Stack Engineer jobs

5 minutes ago

SB

Senior Software Engineer, Infrastructure – Developer Tooling

Standard Bots

Senior Software Engineer leading infrastructure and developer tooling at robotics company. Collaborating with cross - functional teams to ensure secure and efficient robot deployments.

Hybrid Role

New York City United States Full Stack Engineer

$170,000 - $220,000 per year

43 minutes ago

SL

Senior Software Engineer

Siam Makro Public Company Limited

Senior Software Engineer leading development of core retail systems at Makro. Overseeing software engineering teams, project management, and technical leadership.

Hybrid Role

Bangkok Thailand Full Stack Engineer

1 hour ago

AL

Senior Engineer, WSO

Antares Capital LP

Senior Engineer at Antares Capital overseeing the Wall Street Office platform. Focused on stability, scalability, and integration architecture in enterprise financial technology.

Hybrid Role

New York City United States Full Stack Engineer

$140,000 - $165,000 per year

1 hour ago

CC

Senior Software Architect

Cole Engineering Services, Inc. (CESI), a By Light Company

Software Architect designing large - scale distributed simulation systems. Collaborating on military C2 interfaces and executing full lifecycle software development in a diverse technical environment.

Hybrid Role

Orlando United States Full Stack Engineer

4 hours ago

EI

Principal Engineer, Global Environmental Waste Program

Eightfold

Principal Engineer leading global waste management program for Micron Technology. Driving compliance and operational excellence across diverse manufacturing sites.

Onsite Role

Boise United States Full Stack Engineer

4 hours ago

PA

Technical Lead, React Native

Parser

Technical Lead for React Native in a high - scale mobile engineering organisation. Leading hybrid delivery and collaborating with iOS and Android teams in London.

Hybrid Role

London United Kingdom Full Stack Engineer

4 hours ago

TD

Software Engineer II

TD

Software Engineer II developing components while providing system solutions development at TD. Engaging in analytics, support, testing, and proof of concepts aligned with objectives.

Hybrid Role

Mount Laurel United States Full Stack Engineer

$79,160 - $127,670 per year

4 hours ago

TD

Software Engineer II

TD

Software Engineer II at TD providing technical expertise in software development. Collaborating with cross - functional teams to deliver high - quality solutions.

Hybrid Role

Toronto Canada Full Stack Engineer

CA$81,600 - CA$115,200 per year

5 hours ago

HP

Intern, Software Engineer – IT Manufacturing Ops

Hikma Pharmaceuticals

Intern supporting operational IT tasks and SAP platform enhancements at Hikma Pharmaceuticals in Columbus, OH, while gaining practical experience.

Onsite Role

Columbus United States Full Stack Engineer

$19 per hour

5 hours ago

RA

Senior Technical Lead – CPQ

Rolls-Royce Power Systems AG

Sr. Technical Lead managing the architecture and design of CPQ solutions for Rolls - Royce. Collaborating with partners and enhancing solutions across multiple countries.

Onsite Role

Mankato United States Full Stack Engineer

$90,985 - $136,477 per year