AI Data Pipeline Engineer designing and operating high-throughput systems for petabyte-scale data delivery. Collaborating across teams to ensure data flows into AI workloads efficiently.
Responsibilities
Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization.
Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization.
Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects.
Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs.
Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads.
Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.
Requirements
Extensive professional experience in building and operating production-grade data pipelines for massive-scale AI/ML datasets.
Strong proficiency in distributed processing frameworks, particularly Apache Spark and the Databricks ecosystem.
Deep hands-on experience with workflow orchestration tools like Apache Airflow for managing complex dependency graphs.
Solid understanding of Kubernetes and containerization for deploying and scaling data processing components.
Proficiency in distributed messaging systems such as Apache Kafka for high-throughput data ingestion and event-driven architectures.
Expert-level programming skills in Python for system-level optimizations.
Strong knowledge of cloud-native services and best practices for building secure and scalable data infrastructure.
Logical approach to problem-solving with the persistence to identify and resolve root causes in complex, large-scale systems.
Strong communication skills to effectively collaborate with cross-functional teams and external partners.
Benefits
이력서 제출 시 주민등록번호, 가족관계, 혼인 여부, 연봉, 사진, 신체조건, 출신 지역 등 채용절차법상 요구 금지된 정보는 제외 부탁드립니다.
모든 제출 파일은 30MB 이하의 PDF 양식으로 업로드를 부탁드립니다. (이력서 업로드 중 문제가 발생한다면 지원하시고자 하는 포지션의 URL과 함께 이력서를 [email protected]으로 전송 부탁드립니다.)
Lead Data Engineer responsible for evolving Manna’s data infrastructure for drone delivery. Overseeing data architecture and analytics while building scalable data pipelines.
Data Engineer designing, implementing, and optimizing data pipelines for DeepLight AI. Collaborating closely with a multidisciplinary team to analyze large - scale data.
Data Engineer designing and maintaining scalable ETL pipelines at Satori Analytics. Collaborating with teams to deliver high - quality analytics solutions across various industries.
Data Architect responsible for defining enterprise data architecture on AWS and Databricks Lakehouse platforms. Enabling scalable data lakes and enterprise analytics for financial services organizations.
Data Platform Operations Support leading data engineering strategy across projects for EXL. Driving innovation and optimization while collaborating with various teams in the organization.
Manager II leading data engineering projects at Navy Federal Credit Union. Overseeing data governance and quality initiatives while managing engineering teams in a hybrid work environment.
Senior Data Engineer building and maintaining data pipelines for cloud and AI solutions at Qodea. Collaborating with ML engineers and focusing on reliability and performance in a cloud - native environment.
Principal Data Engineer responsible for architecting scalable data pipelines and building high - quality data foundations. Collaborating closely with experts to ensure data readiness for advanced analytics.
Senior Data Engineer at Qodea designing scalable data pipelines and infrastructure. Delivering solutions utilizing cutting - edge tools and collaborating closely with teams for impactful results.
Senior Data Engineer designing and maintaining data pipelines for Qodea's global technology solutions. Collaborating with teams to ensure data quality and governance across platforms.