Senior Systems Engineer at NVIDIA focused on improving AI cluster resiliency and delivering AIOps solutions. Collaborating with team members to debug complex issues and enhance customer satisfaction.
Responsibilities
Bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs
Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems
Deliver compelling technical presentations and lead hands-on demos or training
Handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged throughout the customer journey
Requirements
Bachelor of Science or equivalent experience
8+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development
Benefits
Equity
Benefits
Job title
Senior Systems Engineer, Artificial Intelligence Operations
Lead Identity Systems Engineer managing identity and access management systems for Sanford Health. Overseeing deployments, configurations, and mentoring team members with a focus on security and compliance.
System Engineer focusing on cloud migration and infrastructure for MobiLab Solutions GmbH. Leading migrations and optimizing systems across Azure and VMware in a hybrid work setting.
Talent Acquisition Systems Analyst supporting LYB's global recruiting CRM data migration and analytics. Lead data migration project and optimize reporting tools for better hiring decisions.
Systems Analyst collaborating with business stakeholders to understand requirements and deliver technology solutions. Utilize Agile methodologies, JIRA, and Confluence for backlog management and user story translations.
Senior Business Systems Analyst supporting analytical aspects of product life cycle for Highmark Health. Collaborating on project specifications and leading teams to deliver quality business solutions.
OCI Cloud Systems Engineer driving cloud - based network stability and operations for federal customers. Engaging in system integration and performance analysis with a dedicated team.
Systems Engineer developing capabilities for current and future macOS, iPadOS, and iOS environments. Enhancing management toolsets like JAMF and Intune at Eversource.
Senior Business Systems Analyst shaping solutions that strengthen governance, risk, and compliance. Join QIC to implement a new enterprise technology GRC platform.
Junior HR & Payroll IT Systems Analyst at Boeing Australia managing PeopleSoft and Aurion applications support. Providing troubleshooting, training, and collaborating with IT and HR teams.
System Engineer supporting the management of IT infrastructure and ensuring system availability at Jupiter Medical Center. Collaborating with various technical teams on critical IT initiatives and operational support.