ML Inference Router Engineer designing scalable inference systems at eBay. Aiming to support billions of daily requests with a focus on reliability and efficiency.
Responsibilities
Design and build an LLM inference gateway that scales to billions of daily requests with millisecond-level latency.
Develop intelligent request routing, load balancing, and fallback mechanisms across heterogeneous LLM backends (internal and external).
Optimize throughput, cost, and reliability of inference workloads in multi-tenant environments.
Collaborate with platform, research, and product teams to integrate new models and agentic capabilities into the gateway.
Implement observability, tracing, and autoscaling for inference traffic across Kubernetes-based clusters.
Conduct design and code reviews to ensure high standards in distributed systems architecture.
Stay current with advances in LLM serving, inference acceleration, and model APIs to continuously evolve the platform.
Requirements
10+ years of experience building large-scale, fault-tolerant, high-performance distributed systems.
Strong programming skills in one or more of Java, Go, Rust, or C++ (Java preferred for gateway services).
Deep understanding of networking, concurrency, memory management, and performance tuning in production systems.
Proven experience designing and operating low-latency APIs at very large scale (10M+ QPS).
Hands-on experience with Kubernetes, service meshes, and container orchestration at scale.
Strong background in cloud infrastructure (AWS, GCP, Azure) and distributed system design.
Benefits
full range of medical benefits
financial benefits
various paid time off benefits, such as PTO and parental leave
Project Engineer at DOF executing assigned work packages as part of the engineering team. Supporting project or tender execution in a highly skilled and positive environment.
Engaging content developer for diploma programs at L'atelier des Chefs. Collaborating with project leads to create innovative learning paths and ensure quality e - learning materials.
Ingénieur Travaux Génie Civil supervisant des projets complexes de réparation et de renforcement d’ouvrages d’art. Basé à Nantes, impliquant collaboration avec clients et équipes techniques.
Senior Messaging Engineer providing SME in messaging solutions for global IT team. Designing and maintaining Microsoft Exchange Hybrid infrastructure and Office 365 messaging security.
Senior Project Engineer at Glacier overseeing robotics deployments and customer management. Ensuring seamless project execution and collaboration for environmental impact through recycling solutions.
Software Engineer working with a small team on 5G MAC algorithms for cellular products. Focused on development of innovative, energy - efficient Open RAN solutions.
Engineer developing cellular radio products ensuring compliance with 3GPP specifications. Collaborating in Linux environment for coding and debugging customer issues.
Engineer in Technical Approval team at Suffolk County Council responsible for ensuring highway improvements compliance. Collaborate with developers, providing technical approvals and responses for highway infrastructure.
Data Engineer involved in developing backend systems and migrating data to GCP for Consort Group. Focused on APIs, batch processing, and producing production - ready deliverables.