Infrastructure-Focused Data Engineer for NVIDIA’s Data & Observability Platform. Developing data pipelines and managing Data Lakehouse for massive-scale operations.
Responsibilities
Build Scalable Data Pipelines: Develop and deploy high-throughput, reliable pipelines to move substantial volumes of telemetry information from global edge locations to our central Data Lakehouse.
Architect the Data Lakehouse: Lead the implementation of our tiered storage strategy. You will design efficient schemas that optimize for both write-heavy real-time ingestion and fast, cost-effective interactive queries.
Orchestration & Automation: Modernize workflow scheduling by implementing robust, code-based data pipelines. You will build workflows that handle complex dependencies, automated backfills, and intelligent retries.
Drive Embedded Data Optimization: Partner directly with internal engineering teams to audit their data usage. You will identify heavy-hitter datasets and primary storage consumers, refactor inefficient schemas, and enforce lifecycle policies to significantly reduce storage costs.
Manage Data Infrastructure: Own the operation of the underlying platform. You will manage stateful deployments on Kubernetes, optimize Spark performance, and ensure the reliability of our streaming architecture.
Enforce Quality & Governance: Implement automated schema validation and data quality checks to prevent bad data from entering the lake. You will collaborate with security teams to apply automated masking and access controls.
Requirements
BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience).
8+ years of experience in Data Engineering with a strong focus on Infrastructure, Streaming, or Platform building.
Strong Coding Fluency: Expert proficiency in Python for automation, tooling, and orchestration.
Proficiency in Java or Scala for high-performance data processing (Spark/Flink).
Deep Streaming Expertise: Extensive experience with Kafka.
Data Lake Experience: Hands-on experience with modern table formats (Apache Iceberg, Delta Lake, or Hudi) and distributed query engines (Trino/Presto/Spark).
Containerization & Ops: Deploy, configure, and debug applications on Kubernetes using Helm.
Benefits
Equity and benefits
Job title
Senior System Software Engineer – Data Engineering
Senior Data Engineer for global payments platform designing ETL pipelines and data models. Collaborating across teams to tackle complex data challenges in an innovative fintech environment.
Data Warehouse Modelling Engineer designing and maintaining data models using Data Vault 2.0 for iGaming industry. Collaborating with stakeholders and optimizing data models in a hybrid work environment.
Senior Data Engineer driving impactful data solutions for the climate logistics startup HIVED's core data platform. Collaborating with cross - functional squads to enhance analytics and delivery.
Data Engineer developing and maintaining CRE forecasting infrastructure for Cushman & Wakefield. Collaborates with senior economists and technical teams to ensure high - quality data solutions.
Data Engineer at PwC, engaging with Azure cloud services to enhance data handling and integrity. Responsibilities include pipeline optimizations, documentation, and collaboration with stakeholders.
Data Engineer Manager at PwC focusing on building data infrastructure and solutions. Leading data engineering projects to transform raw data into actionable insights and drive business growth.
Junior Data Engineer at OneMarketData focusing on data quality and integrity in financial datasets. Collaborating with senior analysts and assisting in data management and analysis tasks.