Lead Systems Engineer managing AI platform operations at emerging AI infrastructure start-up. Oversee vendor collaboration, technical troubleshooting, and customer engagement for optimal service delivery.
Responsibilities
Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
Monitor system health, alerts, and customer usage patterns
Document solutions/workarounds, create and maintain knowledge, document support procedures
Automate common tasks and fixes
Configure and integrate tooling to support optimal operation of the platform, and support tool selection
Assist customers with platform configuration, onboarding, and usage best practices
Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
Ensure SLAs and customer satisfaction targets are met
L1 support for customer-reported issues and requests
L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure
Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing
Requirements
Extensive experience in technical support, system engineering, or platform operations
Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting)
Familiarity with cloud-based platforms, APIs, and distributed systems
Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics)
Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk)
Excellent communication skills to interface with both customers and internal / vendor teams
Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience
System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel
Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration
Understanding of automation, monitoring and security with GPU as a service.
Senior Systems Integration Specialist building large - scale integrations and contributing to TypeScript/React codebases at HR tech firm. Collaborating with cross - functional teams to deliver high - impact solutions.
Business Systems Analyst gathering requirements and designing solutions for business needs. Overseeing implementation and ensuring systems align with business goals.
Software Systems Engineer I focusing on logistics software installation and development, requiring travel and collaboration with international teams for project - related work.
Senior Network System Engineer at Kempower focusing on connectivity for fast charging solutions. Collaborating on embedded Linux, networking technologies, and IoT infrastructures for worldwide chargers.
Lead Embedded Systems Developer at Vaisto Solutions focusing on designing and optimizing embedded systems for major tech companies while encouraging continuous learning
Subject matter expert Systems Engineering Architect supporting a training network replicating operational enterprise environment at CACI. Engage in Cyber Operations with growth opportunities and team building.
Embedded Software Engineer working on Aerospace & Defence projects for Capgemini Engineering. Developing avionic software applications and participating in various technical activities.
Systems Engineer applying systems engineering principles working closely with government stakeholders and program teams at Markon. Ensuring technical excellence and successful delivery throughout the project lifecycle.
Systems Engineer ensuring technical excellence across the system lifecycle at Markon. Collaborating with government stakeholders and program teams to deliver high - quality systems engineering solutions.
Systems Architect at Markon supporting Agile project execution and logistics coordination. Leading Scrum meetings and managing procurement processes for successful delivery.