AI Platform Systems Software Engineer responsible for designing core infrastructure for AI/ML workloads. Join eBay in building a next-generation AI platform for millions of users.
Responsibilities
Design and scale services to orchestrate AI/ML clusters across cloud and on-prem environments
Develop and optimize intelligent scheduling and resource management systems for heterogeneous compute clusters
Integrate Ray Train/Tune for large-scale distributed training workflows and Ray Serve for low-latency, autoscaled inference
Build features to improve reliability, performance, observability, and cost-efficiency of AI workloads at scale
Enhance the control plane to support secure multi-tenancy and enterprise-grade governance
Implement systems for container management, dependency resolution, and large-scale model distribution
Collaborate with ML researchers, applied scientists, and distributed systems engineers to drive platform innovation
Provide production support and work closely with field teams to resolve infrastructure issues
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience)
8-10 years of experience building and maintaining infrastructure for highly available, scalable, and performant distributed systems
Proven expertise with cloud-native technologies (AWS, GCP, Azure) and Kubernetes-based deployments
Hands-on experience running ML training and inference with Ray (ray.io)
Deep understanding of networking, security, authentication, and identity management in distributed/cloud environments
Hands-on experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Strong coding skills in Go and/or Python; familiarity with other systems-level languages is a plus
Knowledge of Linux internals, containers, and storage systems
Experience optimizing for GPU/accelerator integration (NVIDIA, AMD, TPU, etc.) is highly desirable
Benefits
Full range of medical benefits
Financial benefits
Various paid time off benefits, such as PTO and parental leave
Senior AI Engineer responsible for building and scaling AI capabilities at Elevance Health. Collaborating with multi - disciplinary teams to enhance operational efficiency and technical governance.
AI Engineer developing advanced AI and computer vision systems for industrial automation at Synergeticon. Leveraging cutting - edge research to create production - ready solutions.
AI Engineer designing and delivering GenAI solutions at RebelDot. Collaborating across teams to build reliable systems and improve client AI offerings.
Senior Lead AI Engineer delivering advanced AI solutions at Capital One. Collaborating with cross - functional teams to innovate banking experiences using AI and machine learning.
Senior AI Engineer at Contour Software designing GenAI systems for diverse enterprise solutions. Responsible for AI platform architecture, production - ready systems, and LLM orchestration layers.
PwC AI Engineer - Senior Manager designing and implementing AI solutions. Leading data science teams and managing client relationships in innovative projects.
AI Engineer leveraging extensive AI research and engineering skills for scalable banking solutions. Focused mainly on deploying AI/ML systems and staying current with cutting - edge technologies.
Senior AI Engineer building and deploying AI/ML solutions in production for a consulting firm. Collaborating with engineering and product teams to deliver scalable AI systems and stay updated on AI advancements.