Data Scientist at Stefanini shaping LLM customization via data pipelines and sources. Engaging in data structuring, quality assurance, and efficient storage practices.
Responsibilities
Design and implement data pipelines to support the LLM customization process
Collect, process, and structure diverse data sources
Develop scripts and processes for extracting structured and unstructured data
Implement transformations to convert raw data into formats suitable for training
Ensure the quality, consistency, and relevance of the data used for training
Create mechanisms for validation and testing of datasets
Develop processes for data enrichment
Implement efficient storage for data and training results
Configure data integration between the trained model and the Elastic platform
Document data architecture, flows, and transformations
Implement data versioning and traceability practices
Optimize data flow for model training iterations
Ensure security and compliance in the handling of data used
Requirements
Additional courses in natural language processing or data preparation for ML (desirable)
Practical knowledge of the Elastic Stack platform (Elasticsearch, Logstash, Kibana) | Level: Advanced (Required)
Experience preparing datasets for training language models | Level: Advanced (Required)
Experience with extraction, transformation, and loading (ETL) of unstructured data | Level: Advanced (Required)
Benefits
Meal allowance or food voucher
Discounts on courses, universities, and language schools
Stefanini Academy — a platform with free, up-to-date online courses and certificates
Mentoring
Benefits club for consultations and medical exams
Health insurance
Dental insurance
Employee discounts and benefits at top establishments
Data Scientist developing machine learning solutions and delivering insights for operational decisions. Collaborating with stakeholders to apply analytical techniques and improve business outcomes.
Data Scientist responsible for modeling and analyzing credit risk at CAIXA Consórcio. Utilizing data - driven insights to support strategic decision - making in credit operations.
Data Scientist optimizing payments ecosystem for Preply, enhancing user experience through data - driven insights. Collaborating with teams to improve payment processes and fraud management.
Staff Data Scientist at Preply developing data strategies for product domains. Collaborating with executives to drive long - term strategy and experimentation frameworks.
Data Manager leading data strategy and governance for Global Payments Solutions at Bank of America. Managing data architecture aligning with business and regulatory needs while overseeing complex data ecosystems.
Data Scientist developing and implementing LLM - based agents and leveraging AI techniques to improve client value. Collaborating on project challenges in a dynamic, start - up environment at Gartner.
Data Scientist in AI SaaS integrating 100+ systems for a European unicorn - in - the - making. Ensure scalability, security, and performance in a high - growth environment.
Data Science Intern working on AI - driven recipe and hardware optimization problems in semiconductor processes. Developing machine learning models and collaborating with engineering teams for innovative solutions.
Senior Data Scientist at LexisNexis developing AI - driven solutions for legal analytics. Collaborating with teams to implement machine learning models and monitor performance metrics.
Product Analyst for a leading media tech company managing SEO - friendly content and commercial campaigns. Collaborate with teams for digital content production and user engagement analysis.