Principal Software Engineer focusing on AI Inference, optimizing GPU performance and contributing to open-source projects. Collaborative role improving inference systems like vLLM and SGLang at NVIDIA.
Responsibilities
Drive upstream-first engineering in vLLM/SGLang: author and land PRs or equivalent experience, engage in development discussions, help compose roadmaps, and build durable maintainer relationships.
Build and implement inference-runtime features that improve efficiency, latency, and tail behavior: request scheduling, batching policies, KV-cache management (paging/sharding), memory planning, and streaming.
Optimize core hot paths across the stack—from Python orchestration down to C++/CUDA kernels—using profiling and measurement to guide decisions.
Improve multi-GPU and multi-node inference: communication patterns, parallelism strategies (tensor/sequence/pipeline), and system-level scaling/efficiency.
Strengthen correctness, robustness, and operability: determinism where needed, graceful degradation, backpressure, observability hooks, and performance regression testing.
Collaborate across NVIDIA to integrate upstream advances with production needs (deployment patterns, compatibility, security posture) while keeping changes broadly adoptable by the community.
Mentor senior engineers, raise the technical bar through build reviews, and establish guidelines for performance engineering and upstream contribution workflows.
Requirements
15+ years building production software with significant depth in systems engineering
strong track record of owning ambiguous, high-impact technical problems end-to-end
demonstrated expertise in LLM inference/serving systems (e.g., vLLM, SGLang) and the tradeoffs that drive real production performance
strong programming skills in Rust, C++, Python, CUDA; ability to read, modify, and optimize performance-critical code across layers
experience with GPU performance analysis tools and methodologies (profiling, microbenchmarking, memory/comms analysis) and a strong measurement culture
solid foundation in distributed systems and concurrency: queues/schedulers, RPC/streaming, multi-process/multi-threaded runtime behavior, and scaling patterns across nodes
excellent communication skills; ability to influence across teams and represent NVIDIA well in open-source technical forums
BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience)
Principal Engineer responsible for enhancing service integrations at CDP Global, focusing on environmental impact. Collaborate with tech leads to align on integration standards and document architecture.
Software Development Engineer creating innovative features for Adobe Experience Manager product. Collaborating with global brands and applying AI experimentation in a creative software development role.
Fullstack Developer at MUFG, collaborating with senior technical teams to create innovative solutions. Responsible for application design, programming tasks, and deployments in a cloud environment.
Senior R&D Technical Leader partnering with marketing to drive adult and fem care innovation at Kimberly - Clark. Leading projects and aligning teams for enhanced product development and execution.
Senior Software Engineer developing scalable and high - performing applications for Rev's SaaS platform. Collaborating with cross - functional teams and mentoring junior developers with modern technologies.
Senior Software Engineer building and scaling Lambda’s IAM platform enabling secure access control. Designing core IAM capabilities and collaborating with cross - functional teams.
AI Software Engineer integrating commercial AI tools and agents into design flow at Broadcom. Responsible for optimizing performance and coordinating AI systems within a worldwide R&D team.
Principal Software Engineer developing scalable backend systems for Walmart's Digital Out of Home platform. Leading architecture, mentoring engineers, and guiding technical direction across thousands of retail locations.