AI Agent Evaluation Engineer developing evaluation frameworks for AI agents with an emphasis on safety and ethical standards. Collaborating with AI teams to ensure high-quality performance metrics.
Responsibilities
Evaluation (Evals) Development: Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions.
Responsible AI and Safety Evals (New Focus): Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses.
Test Strategy & Execution: Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing (UAT) specifically for conversational and goal-oriented AI agents.
Bug Detection & Management: Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior.
Automation & Tools: Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles.
Requirements
Experience: 6+ years in Software QA, with at least 2 years focused on testing or evaluating AI/ML systems, conversational agents, or Large Language Models (LLMs).
Safety Evals Expertise (Mandatory): Direct experience in designing and executing safety evaluations (red teaming, adversarial testing), bias detection, and measuring toxicity/harmful content in generative AI models.
Agent/LLM Evals: Proven experience developing and running general evaluations (Evals) for LLM-powered applications knowing libraries like PyTest (Must)
Google ADK Familiarity (Mandatory): Direct or strong conceptual understanding of the Google Agent Development Kit (ADK) and its components.
Programming: Strong proficiency in Python is mandatory for script development, data processing, and automation.
Cloud & MLOps: Familiarity with Google Cloud Platform (GCP) services relevant to AI/ML (e.g., Vertex AI) and integrating testing into MLOps workflows.
Tools and Libraries: Langsmith, DeepEval, Ragas, Giskard, Hugging face.
Project Coordinator at Ledcor Technical Services managing telecom operations and project performance. Coordinating projects to ensure timely delivery while collaborating with teams and stakeholders.
Senior Project Manager leading projects in medical devices with compliance and quality assurance. Collaborating with cross - functional teams to ensure timely delivery and regulatory adherence.
Senior Coordinator supporting the execution of brand marketing campaigns for Paramount Pictures and Paramount+. Collaborating with creative teams to ensure timely campaign delivery.
Assistant project manager contributing to Quality/Safety/Environment projects at Kärcher. Engaging in regulatory compliance and performance improvement initiatives.
Project Manager overseeing HVAC operations at Mesa Energy Systems. Leading team performance and managing contracts to meet financial goals in Phoenix, AZ.
Project Manager focusing on onboarding new clients for IT management services at Atlas Technica. Delivering technical projects and client support amid a fast - paced environment.
Project Management Officer at Barclays supporting project delivery and compliance with governance standards. Collaborating with change delivery managers to ensure successful project outcomes.
Project Manager focused on delivering individual projects within scope, time, and budget at Barclays. Collaborating with teams to ensure project success and effective stakeholder management.
Project Manager overseeing new franchise studio openings for JETSET Pilates. Collaborating across departments to ensure operational success and empower franchise owners.