AI Engineer – LLM Ops, Evaluation at Auxilius.ai | Hybrid Hired

About the role

AI Engineer for LLMOps & Evaluation at Auxilius.ai building AI solutions for Governance, Risk and Compliance. Own LLMOps pipeline and drive prompt engineering in a hybrid environment.

Responsibilities

Own the LLMOps pipeline: Evaluate infrastructure, prompt optimization loop, and the production integration that turns experiments into reliable customer-facing features
Design evaluation strategy per output type: Decide when to use deterministic evals (exact match, schema validation, embeddings) vs. LLM-as-judge, and build the rubrics, test datasets, and human-review loops that make the system trustworthy
Drive prompt engineering and optimization across all LLM operations in the product: Moving from hand-tuned prompts to a measurable, iterative process
Pick the right tool for each problem: Some things are LLM problems, some are embedding + classical NLP problems, some are deterministic logic
Run the production side of AI features: Observability (Langfuse /LangSmith / similar), cost and latency engineering, incident response when an LLM feature degrades
Build human-in-the-loop workflows: Review queues, feedback ingestion, labeling; so production signal feeds back into evals and prompt iteration
Mentor our AI & Analytics Intern and contribute to how we build the AI team over time

Requirements

3+ years of hands-on experience building and shipping ML/AI systems in production (we care more about what you've shipped than years on a CV)
Have shipped an LLM evaluation or prompt optimization pipeline, not just used LLMs in a project, but owned the loop
Strong hands-on experience with LLM-as-judge, including its variance problems and concrete techniques for controlling them
Solid foundation in classical NLP and ML ops: Embeddings, semantic similarity, entity matching, classification, fuzzy matching
Informed opinions on deterministic vs. LLM-based evals, from experience
Production judgment: You've owned cost and latency tradeoffs, observability, and incident response for an LLM-powered feature. You're familiar with prompt regression and have strategies for managing it
Strong Python
Excellent English communication, written and verbal: We discuss nuanced technical tradeoffs daily with the founding team and customers
Comfort with ambiguity: You can run experiments on real data, build intuition for this domain, and know when to stop iterating

Benefits

Hands-on ownership of a real AI product used by enterprise customers
Work directly alongside the founding team from day one
Hybrid work model: Munich North, minimum one day per week in the office, otherwise flexible (open to strong candidates elsewhere in the EU for the right fit); onboarding will take in-office
A steep learning curve at the intersection of LLM engineering, enterprise GRC, and startup operations
The chance to shape the AI team as we grow

Similar roles

Browse all Ai Engineer jobs

3 hours ago

AL

AI Engineer, Legal Innovation Team

Arthur Cox LLP

AI Engineer building NLP solutions for legal technology. Extracting meaning from legal documents and turning specifications into working solutions.

Hybrid Role

Dublin Ireland Ai Engineer

5 hours ago

HC

Senior AI Engineer

HumanIT Digital Consulting

Senior AI Engineer leveraging Generative AI and Machine Learning in a hybrid position. Collaborating on solutions that enhance logistics and data - driven processes for a technology - driven enterprise.

Hybrid Role

Lisbon Portugal Ai Engineer

5 hours ago

QU

AI Engineer

Qualysoft

AI Engineer responsible for designing, building, and deploying machine learning models. Working at Qualysoft, delivering AI solutions with robust, scalable, and optimized performance.

Hybrid Role

Bucharest Romania Ai Engineer

6 hours ago

XT

Senior AI Engineer

XTEL

Senior AI Engineer at XTEL driving innovation using Python and GenAI. Collaborate in a flexible remote - hybrid environment with an international team.

Hybrid Role

Bologna Italy Ai Engineer

7 hours ago

RE

AI Engineer

RedCloud

Senior Software Engineer with AI expertise needed for a B2B commerce platform. Design and implement solutions using AI and machine learning to innovate logistics and payments.

Hybrid Role

London United Kingdom Ai Engineer

9 hours ago

CL

AI Engineer

Clario

AI Engineer at Clario designing AI - powered applications for clinical research. Collaborating with cross - functional teams to build innovative software solutions.

Hybrid Role

Leuven Belgium Ai Engineer

13 hours ago

EH

Gen AI Engineer

Elevance Health

Gen AI Engineer responsible for analyzing and modeling organizational data for AI at Elevance Health. Collaborating on custom ML, NLP, and LLM model development and deployment.

Hybrid Role

Atlanta United States Ai Engineer

13 hours ago

KY

AI Architect

Kyndryl

Design and deliver distributed architectures for cloud - based enterprise systems at Kyndryl. Integrate Agentic AI capabilities and collaborate with teams for robust solutions.

Hybrid Role

Tokyo Japan Ai Engineer

19 hours ago

BI

Agentic AI Developer

Birlasoft

Developing agent - based automation solutions using intelligent automation principles for enterprise systems integration and orchestration. Collaborating with teams to ensure solution scalability and reliability.

Onsite Role

Bengaluru India Ai Engineer

20 hours ago

BG

GenAI Platform & Capabilities Manager

Betclic Group

GenAI Platform & Capabilities Manager at Betclic leading AI transformation. Driving cross - functional initiatives and building scalable GenAI foundations in a hybrid work environment.

Hybrid Role

Bordeaux France Ai Engineer