Research Engineer developing infrastructure for Aldea's multi-modal AI research team. Building systems that support rapid experimentation at billion-parameter scale in language and speech domains.
Responsibilities
Build and maintain distributed training infrastructure supporting researchers across language and speech domains at a billion-plus-parameter scale.
Optimize training and inference performance across the stack, delivering significant speedups through framework optimization, custom kernels, and system-level improvements.
Design experiment infrastructure including automated evaluation pipelines, experiment tracking, and monitoring systems that enable rapid iteration.
Scale infrastructure from single-node to multi-node distributed training and deploy production inference systems for real-time applications.
Support researchers with fast turnaround on infrastructure issues and maintain high reliability across all systems.
Collaborate with research scientists, data engineers, and leadership to define technical priorities and infrastructure roadmap.
Requirements
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
3+ years of experience with PyTorch and distributed training frameworks (DDP, FSDP, DeepSpeed, or similar).
Experience training large-scale deep learning models at 1B+ parameters.
Deep understanding of training optimization techniques including mixed precision, gradient checkpointing, and memory management.
Proven ability to build production-grade ML infrastructure with high reliability.
Track record of delivering significant performance optimizations in ML training or inference systems.
Benefits
Competitive base salary
Performance-based bonus aligned with research and model milestones
Senior Research Engineer developing mechanical designs for engine demonstrators at GKN Aerospace. Leading technology integration and collaborating across engineering disciplines in aeronautics.
Digital Engineer developing web applications for GKN Aerospace, transforming facilities into digitized factories. Collaborating with teams to enhance digital infrastructures and application performance.
Senior Software Engineer developing robust systems for perception and understanding in Computer Vision and Multimodal AI. Collaborating with teams to enhance product areas and build reliable workflows.
Research Engineer at Quantiphi implementing quantum - assisted hybrid algorithms for optimization and machine learning. Collaborating with teams to develop scalable industry solutions with a focus on quantum computing technologies.
Research Engineer/Specialist at GKN Aerospace working on digital engineering and simulation. Involved in research and development, technology planning, and technical documentation for aerospace innovations.
Research Engineer Intern at the Center for AI Safety working closely with researchers on AI security and ethical issues. Involved in planning experiments and contributing to impactful publications.
Senior Service Innovation Engineer at Johnson & Johnson supporting product development in MedTech. Responsible for technical liaison and service management in innovative health solutions.
Senior Service Innovation Engineer acting as technical liaison for innovative medical technology development at Johnson & Johnson. Collaborating on product lifecycle development and complex technical issues in the field.
R&D Category Assistant at Reckitt thriving with expert scientists ensuring product safety and efficacy. Supporting technical dossier management and analytical lab compliance in a fast - paced environment.