Data Engineer designing and building production data pipelines for AI and ML workloads at Capgemini Engineering. Focus on end to end data lifecycle management and AWS infrastructure.
Responsibilities
Design, build, and maintain research and production data pipelines spanning edge devices, cloud services, and centralized platforms
Own the full data lifecycle including collection, ingestion, processing, obfuscation, versioning, access, retention, and retirement
Develop resilient ingestion pipelines that handle device variability and connectivity challenges
Support secure data transfer from field environments to cloud storage
Collaborate with operations teams to improve data coverage, observability, and reliability
Implement privacy preserving transformations and obfuscation pipelines
Build automated data cleaning and validation processes
Establish data lineage, retention policies, and access controls to ensure compliance and traceability
Provide scalable data services for training, evaluation, and research experimentation
Support continuous data refresh and retraining workflows
Build and optimize pipelines using AWS services such as S3, EC2, SageMaker, Lambda, Glue, and Step Functions
Requirements
Bachelor’s or master’s degree in computer science, data engineering, software engineering, or related field
2-3+ years of experience building production data pipelines and data platforms for AI or ML systems
Strong proficiency in Python, C++ and distributed data processing frameworks
Hands on experience with AWS services including S3, EC2, SageMaker, and Glue
Experience designing data systems that support large scale ML training and experimentation
Knowledge of data governance, access control, and lifecycle management
Experience working with ML, data science, operations, and cloud engineering teams
Benefits
Health insurance from the first days
Christmas holidays from 25 December to 31 December
Cooperation with Superhumans center and Veteran HUB
Psychological counseling provided by the Veteran Hub
Data Engineer at UBDS Group focusing on designing and optimizing modern data platforms. Collaborating in a multidisciplinary team to develop reliable data assets for analytics and operational use cases.
Data Engineer (dbt) at SDG Group involved in all phases of data projects. Collaborate on data ingestion, transformation, and visualization in a hybrid environment.
Data Consultant at SDG Group specializing in Data & Analytics projects. Collaborate on technical - functional definitions, ETL, data modeling, and visualization for cloud solutions.
Senior Data Engineer responsible for growing customer - defined targeting calculations and developing key/value databases for real - time data processing.
Data Engineer developing and maintaining the Data Lakehouse platform using Microsoft Azure technology stack at RBC. Collaborating with business and technology teams to enhance data ingestion and modeling processes.
Data Engineer focused on creating a data platform for automated cyber insurance. Collaborating with stakeholders to deliver data processing capabilities and governance.
Data Engineer building and maintaining data platform solutions for clients at Dignify. Designing, developing, and optimising data models and pipelines with a focus on Google BigQuery.
Data Engineer designing and developing data solutions using AI and machine learning for marketing applications. Collaborating in teams to create impactful data - driven solutions for clients across various industries.
Senior Data Engineer developing scalable data solutions for electric vehicle market at Kempower. Collaborating with cross - functional teams to enhance data engineering processes.
Data Engineer responsible for developing research analytic data infrastructure at Sutter Health. Involves managing data quality, pipelines, and compliance with healthcare regulations.