Senior AI Scientist at Resaro, developing AI evaluation frameworks for generative AI systems. Engaging in deep learning applications, mentoring, and enhancing model performance metrics.
Responsibilities
Lead the design, implementation, and execution of robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models.
Establish and refine metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection.
Perform technical AI evaluations, benchmarking and “red-team” tests on large language models, including assessing them for robustness in performance, embedded biases, vulnerability to jailbreaks and prompt injection attacks.
Work with clients and more junior team members to design custom evaluation approaches using the latest scientific research that address the client’s needs.
Work with the product management team to develop a suite of technical and analytical AI evaluation frameworks and tools that are backed by scientific research and methods.
Lead the design and implementation of evaluation frameworks for Large Language Models (LLMs), including but not limited to GPT-based models, BERT, T5, and other state-of-the-art architectures.
Define and refine metrics for evaluating model performance, such as perplexity, BLEU, ROUGE, accuracy, coherence, factual consistency, and bias detection.
Lead efforts in curating and managing large, high-quality datasets for evaluating LLMs, ensuring data is representative, unbiased, and ethically sourced.
Mentor junior data scientists, guiding them in best practices for LLM evaluation and the latest advancements in NLP.
Stay up-to-date with the latest advancements in Natural Language Processing (NLP) and LLM evaluation, applying cutting-edge methods and tools to improve model performance.
Requirements
Extensive experience as a data scientist training or deploying deep learning based natural language models/large language models in real-world contexts.
About 5-8 years of working experience or a relevant postgraduate degree with 2+ years of working experience building and deploying LLMs.
Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques.
Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization.
Excellent written and verbal communication skills, with the ability to clearly explain complex technical concepts to diverse audiences, including non-technical stakeholders.
Solid programming skills in Python and experience building automated pipelines for continuous model evaluation.
Passion and interest in applied research on the safe and responsible use of AI and with large language models.
NICE TO HAVE: Published research in the field of generative AI or model evaluation.
Hands-on experience with model explainability tools and methods.
Familiarity with cloud-based platforms (e.g., AWS, GCP) for scalable model evaluation and deployment.
Advanced AI Scientist at HP responsible for architecture and leadership of AI ecosystems. Leading projects in data mining, modeling techniques, and automation systems to drive business innovation.
Graduate Machine Learning Researcher at Longshot Systems designing and improving predictive models for sports betting analytics with a focus on innovation and R&D.
AI Research Scientist developing advanced technologies related to multimodal models at Mercari's R4D team. Collaborating on machine learning and computer vision projects that impact e - commerce platforms.
AI Research Intern at Toyota Research Institute exploring AI applications in enhancing wellbeing. Collaborate in developing innovative approaches within the Human - Centered AI Division.
Machine Learning Researcher at Astera Institute focusing on data - efficient and general model induction. Collaborating on innovative architectures with a focus on performance and throughput.
AI Research Intern focusing on advancing deep learning techniques in financial products at TD. Collaborating on large - scale datasets and representing the team at ML conferences.
AI Research Engineer designing downstream AI models operationalizing clinical endpoints in breast cancer care. Collaborating with clinical experts to enhance healthcare innovation in medical imaging.
Machine Learning Engineer designing, implementing, and deploying ML solutions for GEICO. Collaborating with cross - functional teams to integrate ML models and ensure business impact.
Intern AI Researcher at Analog Devices exploring AI models for efficient edge computing. Collaborating with experts to drive breakthroughs in model optimization and compression.
Senior AI Researcher at Dolby developing audio and video technologies with a focus on deep learning. Partnering with experts to innovate in multimedia analysis, processing, and rendering.