AI Infrastructure Engineer focusing on scalable backend systems for AI workflows in a fast-paced startup. Collaborating on reliability, data performance, and infrastructure for rapid growth.
Responsibilities
Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).
Own distributed job orchestration with Temporal and related systems.
Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.
Build observability, monitoring, retries, and fault tolerance into all workflows.
Manage infrastructure reliability, incident response, and performance.
Develop tooling and platform infrastructure to support rapid growth.
Partner with ML engineers to bring models to production at scale.
Requirements
4+ years of backend engineering (Python is a must).
Strong background in distributed systems, job orchestration, and task queues.
Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must.
Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).
Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).
Comfortable with containers & orchestration: Docker, Kubernetes.
Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).
Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.
Track record scaling systems in startups or fast-paced environments.
Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.
Cloud Infrastructure Specialist handling Azure operations and vendor coordination. Driving resilient infrastructure projects with a collaborative, impact - driven team in Warsaw.
Windows Server Infrastructure Engineer maintaining critical enterprise Windows Server environments. Supporting DoD security compliance and infrastructure management for Federal clients in multiple locations.
Infrastructure Engineer contributing to AWS architecture and automation at Oddin.gg. Collaborating with teams to optimize performance and support developer experience.
Cloud Platform Infrastructure Engineer optimizing and managing cloud - native systems in Austin, TX. Collaborating with global teams and participating in agile development processes.
Senior Specialist Infrastructure Architect at Baker Hughes focusing on digital transformation and cybersecurity. Responsible for infrastructure architecture and mentoring team members within the organization.
ML Infrastructure Engineer developing Cloud Data Infrastructure to support Assured AI for Autonomy. Designing and developing infrastructure to enhance Bluespace's APNT capabilities.
Senior Data Infrastructure Engineer responsible for modernizing the data platform while optimizing for cost - efficiency and ensuring scalability. Joining a team focused on user - friendly solutions and data accessibility.
Lead Infrastructure Engineer managing endpoint vulnerabilities and configuration compliance at Truist. Collaborating with engineering and security teams to drive risk reduction and governance.
Senior Cloud Infrastructure Engineer at InfoTrack executing cloud strategy. Designing, building, and optimizing secure, scalable infrastructure while collaborating with global teams.
Principal Engineer leading design and implementation of secure architectures for Walmart’s AI Security Team. Responsibilities include risk management, capacity planning, and cross - team collaboration.