Senior Data Scientist leading design and execution of evaluation frameworks for generative AI systems at Resaro. Focusing on large language models, applying scientific methods to ensure AI safety and effectiveness.
Responsibilities
Lead the design, implementation, and execution of robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models
Establish and refine metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection
Perform technical AI evaluations, benchmarking and “red-team” tests on large language models to assess robustness, embedded biases, vulnerabilities
Work with clients and junior team members to design custom evaluation approaches
Develop a suite of technical and analytical AI evaluation frameworks and tools assessing robustness, explainability, fairness, privacy, safety, and security of AI
Lead design and implementation of evaluation frameworks for Large Language Models (LLMs)
Define and refine metrics for evaluating model performance
Curate and manage large, high-quality datasets for evaluating LLMs
Mentor junior data scientists in best practices for LLM evaluation
Stay up-to-date with the latest advancements in Natural Language Processing (NLP) and LLM evaluation
Requirements
Extensive experience as a data scientist training or deploying deep learning based natural language models/large language models in real-world contexts
About 5-8 years of working experience or a relevant postgraduate degree with 2+ years of working experience building and deploying LLMs
Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques
Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization
Excellent written and verbal communication skills, with the ability to clearly explain complex technical concepts to diverse audiences, including non-technical stakeholders
Solid programming skills in Python and experience building automated pipelines for continuous model evaluation
Passion and interest in applied research on the safe and responsible use of AI and with large language models.
Intern AI Researcher at Analog Devices exploring AI models for efficient edge computing. Collaborating with experts to drive breakthroughs in model optimization and compression.
Senior AI Researcher at Dolby developing audio and video technologies with a focus on deep learning. Partnering with experts to innovate in multimedia analysis, processing, and rendering.
AI Research Lead guiding development and evaluation of African - first open language models at GSMA. Collaborating with researchers and operators to ensure high - quality outcomes across diverse linguistic structures.
AI Scientist Intern focusing on automated speech processing for educational applications at Pearson. Involves mentoring, training ML models, and contributing to prototype ideas.
Lead AI initiatives in multimodal ML/AI at Eluvio AI Labs. Driving innovations for video understanding, content processing, and more in a hybrid environment.
Senior AI Scientist enhancing video and multimodal AI models for Eluvio AI Labs. Developing state - of - the - art models and impacting decentralized content AI and monetization.
AI / Machine Learning Researcher joining Planner 5D's AI team for applied research in home design. Collaborating on real - world product challenges with deep learning methods and data analysis.
Data & AI Scientist leveraging cutting - edge AI technologies to solve business problems for Lloyds Banking Group. Leading initiatives in foundation models and autonomous agents.
Data & AI Scientist leveraging cutting - edge AI technologies at Lloyds Banking Group to solve complex business problems. Collaborating closely with engineering and product teams in a hybrid working model.
Lead Specialist, AI Scientist focusing on AI transformation and full stack development in educational tech. Collaborating with diverse teams for innovative AI - driven solutions.