AI Software Engineer – Model Evaluation at Aleph Alpha | Hybrid Hired

About the role

Senior AI Engineer responsible for end-to-end benchmarks and evaluations at Aleph Alpha Research in Heidelberg. Focus on ML models and German capabilities with ownership in a hybrid environment.

Responsibilities

Own benchmarks end-to-end: select, implement, and maintain the evaluation suite used during pre-training — from dataset curation to scoring infrastructure to result analysis.
Build evaluation infrastructure: develop and optimize the pipelines that run evaluations against training checkpoints, ensuring speed, reliability, and reproducibility.
Design aggregation and reporting: define how benchmark results translate into training decisions, and build the tooling that makes results interpretable.
Close capability gaps: work with product and post-training teams to identify where our models fall short, then create or integrate benchmarks that measure progress.
Own German evaluation: ensure rigorous assessment of German language capabilities — this is core to our value proposition, not an afterthought.
Correlate signals: establish which pre-training metrics actually predict downstream and system-level performance.

Requirements

Experience with LLM evaluation, benchmark design, evaluation dataset curation, and experimental design.
Familiarity with statistical methods for evaluation and experiment design.
Track record of shipping impactful technical work — whether that's research, infrastructure, or both.
Strong Python skills and comfort with ML tooling (PyTorch, evaluation frameworks, distributed systems).
Ability to reason about what an evaluation measures and whether it matters — not just run benchmarks, but understand them.
Ownership mentality: you see problems through from diagnosis to solution to deployment.
Willingness to relocate to Heidelberg or travel regularly (potentially weekly).

Benefits

30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work–life balance and hybrid working model
Virtual Stock Option Plan
JobRad® Bike Lease

Similar roles

Browse all Full Stack Engineer jobs

6 minutes ago

BR

Lead GTM Engineer

BrainPOP

Lead GTM Engineer shaping BrainPOP's AI - first go - to - market engine. Oversee integration architecture and collaborate with cross - functional teams on strategic initiatives.

Hybrid Role

New York City United States Full Stack Engineer

$130,000 - $150,000 per year

23 minutes ago

FG

Senior Fullstack Engineer

freshcells systems engineering GmbH

Senior Fullstack Engineer at freshcells developing backend and frontend solutions with Node.js and React. Focus on collaborative coding, performance optimization, and innovation in software development.

Hybrid Role

Düsseldorf Germany Full Stack Engineer

49 minutes ago

BA

Director, Software Engineering

Bazaarvoice

Director of Software Engineering at Bazaarvoice leading engineering teams and implementing strategic roadmaps. Foster collaboration across global teams to drive performance and innovation.

Hybrid Role

Bengaluru India Full Stack Engineer

1 hour ago

MO

Principal Engineer – Post-Purchase

MOO

Principal Engineer leading architectural and technical strategy for MOO’s Post‑purchase domain. Collaborating with teams to improve order orchestration, fulfilment, and shipping processes.

Hybrid Role

London United States Full Stack Engineer

1 hour ago

FA

Senior Software Engineer – React, React Native

FanDuel

Senior Software Engineer enhancing core React Native for FanDuel's Sportsbook. Collaborating with engineers to improve performance, reliability, and development experience.

Hybrid Role

Edinburgh United Kingdom Full Stack Engineer

1 hour ago

EN

Senior IS Engineer – Oracle CPQ, BigMachines Software Developer

Extreme Networks

Oracle CPQ Software Developer at Extreme Networks responsible for delivery of renewal quoting solutions. Collaborating with agile teams and enhancing CPQ/BMI features.

Hybrid Role

Bangalore India Full Stack Engineer

2 hours ago

EX

Fullstack Engineer

EXL

Fullstack Engineer responsible for designing and implementing software applications. Collaborating with product managers and stakeholders to translate requirements into technical solutions in India.

Hybrid Role

India Full Stack Engineer

2 hours ago

EN

Lead Fullstack Developer

Envidual

Fullstack Developer creating intuitive, sustainable apps as part of an agile team at a Munich IT service provider. Engaging in technology decisions and exploring new technologies.

Hybrid Role

München Germany Full Stack Engineer

€68,000 - €98,000 per year

2 hours ago

IG

Software Engineer

iCert Global

Software Engineer focusing on cloud infrastructure and automation tools for high availability at Icertis. Requires strong technical expertise and collaborative skills in cloud operations.

Hybrid Role

Pune India Full Stack Engineer

4 hours ago

NO

Senior Software Engineer

Nokia

Senior Software Engineer working on AI - augmented cloud - based solutions. Collaborating with a dynamic team to drive efficiency in service operations at Nokia.

Hybrid Role

Wroclaw Poland Full Stack Engineer