Senior Data Engineer designing and overseeing data pipelines in Databricks on AWS. Responsible for data quality and performance for enterprise analytics and AI workloads.
Responsibilities
Design, build and maintain scalable ETL/ELT pipelines that ingest, transform and deliver trusted data for analytics and AI use cases.
Build data integrations with well-known SaaS platforms such as Salesforce, NetSuite, Jira and others.
Implement incremental and historical data processing to ensure accurate, up-to-date data sets.
Ensure data quality, reliability and performance across pipelines through validation, testing and continuous code optimization.
Contribute to data governance and security by supporting data lineage, metadata management and data access controls.
Support production operations, including monitoring, alerting and troubleshooting.
Work with stakeholders to translate business and technical requirements into well-structure, reliable datasets.
Share knowledge and contribute to team standards, documentation and engineering best practices.
Requirements
Data Ingestion & Integration: hands-on experience building robust ingestion pipelines using tools and patterns such as Databricks Auto Loader, Lakeflow Connectors, Fivetran and/or custom API / file-based integrations.
Core Data Engineering: strong development experience using SQL, Python and Apache Spark (PySpark) for large-scale data processing.
Data Pipeline Orchestration: proven experience developing and operating data pipelines using Databricks Workflows & Jobs, Delta Live Tables (DLT) and/or Lakeflow Declarative Pipelines.
Incremental Processing & Data Modelling: deep understanding of incremental data loading, including Change Data Capture (CDC), MERGE operations and Slowly Changing Dimensions (SCD) in a Lakehouse environment.
Data Transformation & Lakehouse Design: experience in designing and implementing Medallion Architecture (bronze, silver and gold) using Delta Lake.
Data Quality, Test and Observability: experience implementing data quality checks with tools and frameworks such as DLT expectations, Great Expectations or similar, including pipeline testing and monitoring.
Data Governance & Lineage: hands-on experience with data cataloguing, lineage and metadata management within Unity Catalog to support governance, auditing and troubleshooting.
Performance Optimization: experience tuning Spark and Databricks workloads, including partitioning strategies, file sizing, query optimization and efficient use of Delta Lake features.
Production Engineering Practices: experience working with code versioning (Git), peer review and promoting pipelines through development, test and production environments.
Security & Access Control Awareness: Understanding of data access control, sensitive data handling and working with Unity Catalog in the context of governed environments.
Benefits
We strive to make any required adjustments where possible to make the process fair and equitable for everyone. Please reach out to [email protected] if you need any accommodations throughout the interview process. Nuix is an equal opportunities employer. Don’t let imposter syndrome hold you back! We welcome all applications and are a flexible employer. As we expand our global team and extend our skills and expertise, we are unified as one Nuix team guided by our shared values. Nuix creates innovative software that empowers organizations to simply and quickly find the truth from any data in a digital world. We are a passionate and talented team, delighting our customers with software that transforms data into actionable intelligence. Love the role, but not the right fit for you? Know someone that might be awesome for this role? We're always looking for talented people who want to make a real impact. If you refer someone and we successfully hire them, you'll receive a $1,000 gift card.
Technical Lead for data engineering and reporting in healthcare technology at Dedalus. Shaping innovative software solutions and leading cross - functional technical teams in Australia.
Senior ML Data Engineer working on data pipeline curation for Mobileye's autonomous vehicle dataset. Collaborating across teams to enhance ML engineering and vision model applications.
Data Engineer managing customer datasets to enhance industrial research and development. Responsible for ETL pipelines and data ingestion for the Uncountable Web Platform.
Data Engineer designing and maintaining scalable data solutions on Databricks for clinical trials. Collaborating with teams to overcome data challenges and ensure the smooth logistics of clinical supplies.
Senior Manager leading a team of database engineers to manage CCC's data platform. Overseeing mission - critical applications and collaborating with cross - functional teams in a hybrid environment.
As a Principal Data Architect at Solstice, lead the design and implementation of data architecture solutions. Ensure data integrity, security, and accessibility to meet strategic organizational goals.
Data Platform Specialist overseeing data workflows and enhancing data quality for Stackgini's AI - driven IT solutions. Collaborating with teams to drive improvements and stakeholder support.
Data Engineer designing data pipelines in Python for a major railway industry client. Collaborate with Data Scientists and ensure code quality with agile methodologies.
Senior Data Engineer responsible for building and optimizing data pipelines for banking analytics initiatives. Collaborating with data teams to ensure data quality and readiness for enterprise use.
Senior Data Engineer developing scalable data solutions on Databricks for analytics and operational workloads. Collaborating with cross - functional teams to modernize the data ecosystem.