Software Engineer developing backend services for AION's AI cloud platform. Collaborating with teams to implement high-performance, scalable distributed systems in a hybrid work environment.
Responsibilities
Build and maintain platform services across AION's Compute and Inference platforms, working closely with senior engineers and platform leads
Implement features for multi-cloud orchestration, resource scheduling, model deployment pipelines, and autoscaling systems
Write well-maintained, production-grade code with proper abstractions, design patterns, and comprehensive test coverage
Contribute to low-level design (LLD) including service APIs, database schema design, data models, and component interactions
Collaborate with senior engineers on high-level design discussions, providing implementation perspectives and feasibility inputs
Develop RESTful APIs and gRPC services for platform control planes, resource management, and inference serving
Design and implement database schemas for storing platform state, resource metadata, billing data, and observability metrics
Work with distributed storage systems, message queues (Kafka, RabbitMQ), and databases (PostgreSQL, Redis) to build reliable platform components
Build event-driven architectures for asynchronous processing, job scheduling, and platform automation
Implement monitoring, logging, and alerting for platform services to ensure production reliability
Write comprehensive unit tests, integration tests, and end-to-end tests to ensure code reliability
Participate in code reviews, providing constructive feedback and learning from senior engineers' perspectives
Refactor existing code to improve maintainability, performance, and scalability
Document design decisions, API specifications, and operational runbooks for platform services
Debug production issues and contribute to incident response and post-mortems
Requirements
2-4 years of experience in backend engineering, platform development, or distributed systems
Strong proficiency in Golang—you write idiomatic Go code with proper error handling, concurrency patterns, and testing
Solid understanding of backend systems fundamentals: RESTful APIs, microservices architecture, and API design principles
Hands-on experience with databases (PostgreSQL, MySQL) including schema design, query optimization, and transactions
Familiarity with storage systems (object storage like S3, block storage, distributed file systems) and their use cases
Experience working with message queues (Kafka, RabbitMQ, NATS) and event-driven architectures
Understanding of distributed systems concepts: consensus, eventual consistency, fault tolerance, and retry mechanisms
Experience with containerization (Docker) and basic Kubernetes concepts
Knowledge of testing frameworks and practices (unit tests, integration tests, mocking)
Familiarity with Git, CI/CD pipelines, and modern development workflows
Exposure to cloud platforms (AWS/GCP/Azure) and their core services is a plus
Experience with infrastructure-as-code (Terraform) or observability tools (Prometheus, Grafana) is beneficial
Benefits
**Preferred Attributes:**
Founder-level ownership and bias for action.
Strong strategic thinking and ability to connect technical decisions to business impact.
Excellent communication and mentoring skills.
Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
**Why Join AION?**
Work directly with high-pedigree founders shaping technical and product strategy.
Build infrastructure powering the future of AI compute globally.
Significant ownership and impact with equity reflective of your contributions.
Competitive compensation, flexible work options, and wellness benefits
Director of Software Engineering at Acuity leading AI - enabled digital commerce platform development and transforming user experience with modern architecture.
Senior Product Engineer leading application and integration of protection and control solutions by Hubbell. Collaborating with engineering, sales, and customer support to deploy tailored technical solutions.
Software Engineer leading a team to develop high quality software solutions for DoD training systems. Supporting the JTSE program at Joint Staff Complex in Suffolk, VA.
Lead Principal Engineer Specialist at SAE facilitating aviation standards through technical management and collaboration. Recruiting and mentoring volunteers while driving continuous improvement initiatives in a hybrid work environment.
Product Engineer overseeing the technical lifecycle of screening and biomass handling products for Valmet. Collaborating with global teams and providing engineering expertise across the product lifecycle.
Lead ETL Developer responsible for ETL solutions involving data integration and automation. Working in a hybrid environment at Canada Life with a strong emphasis on collaboration.
Senior Software Engineer developing high - quality software solutions for Savanta. Collaborating with cross - functional teams in a hybrid work environment to deliver impactful products.
Technical Lead developing and evolving iTakeControl, a clinical trial patient engagement platform at Red Nucleus. Leading in - house product development with a focus on compliance and mentoring engineers.
Principal Software Engineer developing and enhancing secure software systems for Northrop Grumman's CHORD portfolio. Focused on collaboration, team empowerment, and personal responsibility in a complex technical environment.
Software Engineer developing Python applications on Linux for Northrop Grumman's Space Sector. Collaborating with cross - functional teams to deliver secure, scalable software in a SCIF environment.