Principal Data Engineer modernizing cloud-native platforms for AI-powered solutions at Mastercard. Leading teams to enhance data processing efficiency and reliability across global operations.
Responsibilities
Drive modernization from legacy and on-prem systems to modern, cloud-native, and hybrid data platforms.
Architect and lead the development of a Multi-Agent ETL Platform for batch and event streaming, integrating AI agents to autonomously manage ETL tasks such as data discovery, schema mapping, and error resolution.
Define and implement data ingestion, transformation, and delivery pipelines using scalable frameworks (e.g., Apache Airflow, Nifi, dbt, Spark, Kafka, or Dagster).
Leverage LLMs, and agent frameworks (e.g., LangChain, CrewAI, AutoGen) to automate pipeline management and monitoring.
Ensure robust data governance, cataloging, versioning, and lineage tracking across the ETL platform.
Define project roadmaps, KPIs, and performance metrics for platform efficiency and data reliability.
Establish and enforce best practices in data quality, CI/CD for data pipelines, and observability.
Collaborate closely with cross-functional teams (Data Science, Analytics, and Application Development) to understand requirements and deliver efficient data ingestion and processing workflows.
Establish and enforce best practices, automation standards, and monitoring frameworks to ensure the platform’s reliability, scalability, and security.
Build relationships and communicate effectively with internal and external stakeholders, including senior executives, to influence data-driven strategies and decisions.
Continuously engage and improve teams’ performance by conducting recurring meetings, knowing your people, managing career development, and understanding who is at risk.
Oversee deployment, monitoring, and scaling of ETL and agent workloads across multi cloud environments.
Continuously improve platform performance, cost efficiency, and automation maturity.
Requirements
Hands-on experience in data engineering, data platform strategy, or a related technical domain.
Proven experience leading global data engineering or platform engineering teams.
Proven experience in building and modernizing distributed data platforms using technologies such as Apache Spark, Kafka, Flink, NiFi, and Cloudera/Hadoop.
Strong experience with one or more of data pipeline tools (Nifi, Airflow, dbt, Spark, Kafka, Dagster, etc.) and distributed data processing at scale.
Experience building and managing AI-augmented or agent-driven systems will be a plus.
Proficiency in Python, SQL, and data ecosystems (Oracle, AWS Glue, Azure Data Factory, BigQuery, Snowflake, etc.).
Deep understanding of data modeling, metadata management, and data governance principles.
Proven success in leading technical teams and managing complex, cross-functional projects.
Passion for staying current in a fast-paced field with proven ability to lead innovation in a scaled organization.
Excellent communication skills, with the ability to tailor technical concepts to executive, operational, and technical audiences.
Expertise and ability to lead technical decision-making considering scalability, cost efficiency, stakeholder priorities, and time to market.
Proven track leading high-performing teams with experience leading and coaching director level reports and experienced individual contributors.
Advanced degree in Data Science, Computer Science, Information Technology, Business Administration, or a related field. Equivalent experience will also be considered.
Benefits
insurance (including medical, prescription drug, dental, vision, disability, life insurance)
flexible spending account and health savings account
paid leaves (including 16 weeks of new parent leave and up to 20 days of bereavement leave)
80 hours of Paid Sick and Safe Time, 25 days of vacation time and 5 personal days, pro-rated based on date of hire
10 annual paid U.S. observed holidays
401k with a best-in-class company match
deferred compensation for eligible roles
fitness reimbursement or on-site fitness facilities
Palantir Data Engineer supporting production data pipelines at a global data and AI company. Monitoring workflows, troubleshooting issues, and ensuring data system reliability.
Senior Azure Data Architect responsible for designing, deploying, and managing scalable cloud infrastructure on Microsoft Azure. Requires over 9 years of experience in cloud architecture and engineering.
Senior Data Engineer designing and building automated data platforms and pipelines for Coody clients. Delivering efficient solutions in a consultant role across exciting industries and companies.
Senior Data Platform Consultant driving analytics solutions on Databricks for Snap's clients. Leading architecture and delivery while shaping Databricks capability growth within the team.
Data Engineer Associate developing and implementing data solutions within PNC's Asset Management Group. Collaborating on technical solutions using PySpark, Hadoop, and SQL for scalable data systems.
Senior Data Engineer at Assembly working on data integration, transformation, and analytics collaboration. Handling cloud services and data quality across data projects with cross - functional teams.
Senior Data Engineer developing scalable data pipelines and collaborating with cross - functional teams at Technis. Technical guidance in a hybrid work environment based in Lausanne, Switzerland.
Data Engineer designing and maintaining data pipelines for CIEE, a philanthropic institution supporting youth development. Collaborating with Data Analysts for data quality and reliability.
Senior Data Engineer responsible for designing and implementing data solutions at Harambee. Collaborating with various stakeholders to enhance technology supporting work - seekers' journeys.
Senior Manager Data Engineer at Squarcle delivering technical leadership in data engineering and compliance with business objectives. Leading teams to optimize and develop data platforms for clients.