AI Evaluation Engineer at Factorial | Hybrid Hired

About the role

AI Evaluation Engineer developing methodologies for assessing advanced AI systems' performance and reliability. Collaborating in a hybrid role in Ghent, Belgium.

Responsibilities

Design and Develop Evaluation Frameworks: Create scalable, reproducible evaluation pipelines for large-scale AI systems, including LLMs and multi-agent architectures, covering both automated and human-in-the-loop testing strategies.
Metric Innovation: Define and implement novel evaluation metrics that capture model capabilities beyond traditional benchmarks.
Benchmarking & Performance Analysis: Conduct benchmarking of AI models across domains, tasks modalities, analyzing their skills and behavior under different setups.
Safety, Reliability & Alignment Testing: Develop tools and experiments to probe model safety, robustness, interpretability, and bias.
Cross-functional Collaboration: Work closely with model finetuning and optimization teams to evaluate end-to-end system effectiveness, efficiency. Identify trade-offs between model performance, latency, and energy footprint.
Continuous Improvement & Reporting: Monitor model performance over time, automate regression detection, and contribute to the continuous evaluation infrastructure that supports Openchip’s AI research and product roadmap.

Requirements

MSc or PhD in Computer Science, Artificial Intelligence, Machine Learning, Statistics, or a related field.
A publication record in ML evaluation, benchmarking, or interpretability is a plus.
3+ years of experience developing, evaluating, or optimizing AI systems.
Strong programming skills in Python, with experience using PyTorch, TensorFlow, or JAX.
Experience in designing evaluation protocols for LLMs, multi-agent systems, or reinforcement learning environments.
Deep understanding of ML metrics, evaluation methodologies, and statistical analysis.
Experience with data quality, annotation workflows, and benchmark dataset creation is a plus.
Fluent in English; proficiency in additional European languages (German, Dutch, Spanish, French, or Italian) is a plus.

Benefits

The opportunity to build a cloud AI deployment platform that will power next generation AI systems.
A collaborative, innovation-driven environment with significant autonomy and ownership.
Hybrid work model with flexible scheduling.
A chance to join one of Europe’s most ambitious companies at the intersection of AI and silicon engineering.

Similar roles

Browse all Artificial Intelligence jobs

8 hours ago

KY

Data and AI Sales Expert

Kyndryl

Applications, Data and AI Sales Expert at Kyndryl focusing on engaging clients and driving sales growth. Seeking experienced sales professionals in the technical domain with a focus on AI solutions.

Hybrid Role

Madrid Spain Artificial Intelligence

11 hours ago

DI

Working Student, Automation – No‑Code – AI

DigitalCheckIn

Building automations and using AI to enhance operational efficiency at Aurixus GmbH. Engaging in hands - on projects to connect tools and streamline processes for better productivity.

Hybrid Role

Garching bei München Germany Artificial Intelligence

15 hours ago

NG

Founder Associate – Civil Engineering, Architecture, AEC

NeoBIM GmbH

Founder Associate role collaborating with founders to build AI tools for construction. Involvement in strategy, fundraising, and process optimization with high responsibility and impact.

Hybrid Role

Germany Artificial Intelligence

18 hours ago

SB

IT PMO AI Intern

Sally Beauty

Join Sally Beauty as an IT PMO & AI Solutions Intern, building AI - driven project management tools to enhance IT portfolio efficiency and collaboration.

Hybrid Role

Plano United States Artificial Intelligence

yesterday

IC

AI Performance Library Architect

Intel Corporation

Software Development Engineer for oneDNN project at Intel, focusing on AI performance in various frameworks. Responsible for design, development, and optimization of AI workloads.

Hybrid Role

Hillsboro United States Artificial Intelligence

$170,500 - $315,490 per year

yesterday

WE

Generative AI Analyst

Welocalize

Generative AI Analyst developing prompts and datasets for machine learning models. Collaborates with teams on labeling initiatives and LLM training best practices.

Hybrid Role

United States Artificial Intelligence

yesterday

NO

Head of AI

Nortal

Head of AI developing and scaling AI solutions and sales across EMEA for Nortal. Leading client engagement and partnerships to drive market growth in AI.

Hybrid Role

Germany Artificial Intelligence

yesterday

MG

Research & Development Engineer – AI

MWAY GROUP GmbH

Research Engineer AI developing modern applications using AI technologies for business solutions. Collaborating in a motivated team to advance AI vision with clean and maintainable code in Python.

Hybrid Role

Stuttgart Germany Artificial Intelligence

yesterday

HO

Senior Advanced AI Engineer

Honeywell

Senior Advanced AI Engineer at Honeywell focusing on AI - driven solutions for smart buildings and industrial automation. Collaborating cross - functionally and mentoring junior engineers.

Hybrid Role

Atlanta United States Artificial Intelligence

yesterday

GL

AI Delivery Lead

Gradient Labs

AI Delivery Lead overseeing integration of AI platform into customer environments with a focus on delivering measurable results. Collaborating with cross - functional teams to drive strategic outcomes that matter.

Hybrid Role

New York City United States Artificial Intelligence