Performance Engineer optimizing GPU training for foundation models in Heidelberg. Join a team focused on improving efficiency and effectiveness in AI training systems.
Responsibilities
Engineer the systems required to train foundation models at scale.
Maximize hardware utilization and training throughput on our large-scale GPU clusters.
Work at the intersection of deep learning frameworks, distributed systems, and GPU microarchitecture.
Requirements
Are proficient in Python and the PyTorch library.
Have a strong engineering background in parallel and/or distributed systems with proven track record of excellence.
Have hands-on experience with modern machine learning techniques (especially large language models and their life cycle).
Deeply understand the CUDA programming model.
Have experience in distributed programming with APIs like NCCL or MPI.
Have experience analysing profiling traces with tools such as PyTorch Profiler and Nvidia Nsight.
Please note this role requires regular on-site collaboration in Heidelberg as a member of the Training Efficiency Team.
Benefits
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
JobRad® Bike Lease
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Reverse Engineer at Teller building APIs for connecting apps to users' financial accounts. Help crack mobile banking applications for seamless bank integrations.
Project Engineer supporting construction project teams at Fessler & Bowman. Assisting with project planning, scheduling, and management across multiple construction sites.
Lead Engineer developing AI - powered features for FIS’s cloud - based financial platform, collaborating with teams and mentoring junior engineers for architectural excellence.
Controls Engineer designing and maintaining control systems for manufacturing equipment. Involved in troubleshooting and onsite servicing for optimal operations.
Tier III VTC Engineer providing technical expertise for AT&T at customer site in Virginia. Responsible for video teleconferencing troubleshooting, installation, and design at various locations.
Lead Knowledge Engineer at S&P Global driving data transformation initiatives. Collaborating with technology teams to implement next - generation data architecture and knowledge management solutions.
Part 21 Electrical / Avionics Engineer at Boeing responsible for compliance with regulatory requirements. Supporting certification of modifications for global airline partners and collaborating with engineering teams.
Engineer designing, developing, and testing nuclear equipment and systems for Navy ships at Newport News Shipbuilding. Collaborating on safety, efficiency, and performance improvements while conducting relevant research and analysis.
Senior Forward Deployed Engineer embedding in strategic aviation operations to drive measurable impact. Working with airlines and MROs while ensuring successful adoption of AI - driven solutions and product enhancements.
Senior Geotechnical Engineer providing technical leadership and developing engineering solutions for mining projects. Collaborating with teams to ensure compliance and excellence in geotechnical engineering.