AI/ML Infrastructure Architect at the Allen Institute developing engineering infrastructure for AI/ML applications. Collaborating with cross-functional teams to support bioscience research.
Responsibilities
Develop and lead a cloud-agnostic state-of-the-art engineering infrastructure at the Allen Institute to support AI/ML research and applications.
Procure and deploy GPUs to meet computational demands.
Coordinate infrastructure implementation with external partners.
Lead data management, software infrastructure and AI/ML workflow best practices and policies.
Manage and lead a team of engineers.
Develop and implement policies and software for efficient management, prioritization, and scheduling of AI workloads.
Implement Cost Tracking and Reporting for transparency and prevent overruns.
Collaborate with science unit teams to facilitate adoption and use of the new AI pipeline by providing training and support to accelerate the adoption process.
Ensure integration of AI infrastructure with existing platforms.
Develop and oversee a governance framework to ensure use of GPU resources align with the institutes scientific priorities.
Regularly review and adjust resource allocation based on governance inputs.
Help establish community standards for scalability in developing, disseminating, and evaluating AI/ML/computational methods for scientific problems.
Participate in institute-wide initiatives, workshops, and seminars to promote engineering excellence through technical leadership, cross-disciplinary collaboration and knowledge sharing.
Requirements
Bachelors Degree in Computer Engineering or related technical field or equivalent experience
7 years of experience working with MLOps in medium to large scale GPU clusters and/or cloud based ML deployments
Experience with building, deploying and maintaining machine learning models
Proficiency with cloud computing (AWS, GCP or Azure) and with on-prem clusters
Experience with databases, large data management
Working knowledge of AI/ML custom libraries, AI/ML execution platforms
Proven ability to work independently and manage multiple projects simultaneously while meeting deadlines
Excellent written and verbal communication skills, with the ability to collaborate effectively in a multidisciplinary team environment.
AI ML Engineer at global networking leader, shaping ML strategy and building high - performance systems. Innovating with AI technology to enhance network management and develop flagship products.
Senior Staff Machine Learning Engineer leading technical architecture for GEICO's AI Agent Platform. Driving innovation and enhancing productivity for internal associates and customers.
Staff Machine Learning Engineer developing the next generation of AI Agent OS and SDKs for GEICO. Key responsibilities include architecting scalable systems and implementing observability frameworks.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.