Data Engineer designing, developing, and maintaining data products by liaising with stakeholders. Requires strong skills in Python, SQL, and big data technologies.
Responsibilities
Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
Create the data integration and data diagram documentation.
Lead the data validation, UAT and regression test for new data asset creation.
Create and maintain data models, including schema design and optimization.
Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Requirements
Strong knowledge on Python and Pyspark
Expectation is to have ability to write Pyspark scripts for developing data workflows.
Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
Expectation is to have strong problem-solving and troubleshooting skills.
Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
5-7 years of experience in Data Engineer.
Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Warehouse Modelling Engineer designing and maintaining data models using Data Vault 2.0 for iGaming industry. Collaborating with stakeholders and optimizing data models in a hybrid work environment.
Senior Data Engineer driving impactful data solutions for the climate logistics startup HIVED's core data platform. Collaborating with cross - functional squads to enhance analytics and delivery.
Data Engineer developing and maintaining CRE forecasting infrastructure for Cushman & Wakefield. Collaborates with senior economists and technical teams to ensure high - quality data solutions.
Data Engineer at PwC, engaging with Azure cloud services to enhance data handling and integrity. Responsibilities include pipeline optimizations, documentation, and collaboration with stakeholders.
Data Engineer Manager at PwC focusing on building data infrastructure and solutions. Leading data engineering projects to transform raw data into actionable insights and drive business growth.
Junior Data Engineer at OneMarketData focusing on data quality and integrity in financial datasets. Collaborating with senior analysts and assisting in data management and analysis tasks.
Senior Data Engineering Analyst developing and implementing data solutions. Collaborating in a diverse environment focused on data processing and analysis for clients' digital transformation.
Principal Software Engineer in Threat Data Platform developing AI - driven tools for threat intelligence automation. Collaborating on robust data pipelines for PANW’s product ecosystem.