Senior Operations Engineer driving efficiency and reliability in NVIDIA's global business operations. Collaborating with IT subsystems and automating operational workflows for organizational impact.
Responsibilities
Driving day-to-day interactions with NVIDIA wide IT subsystems, ensuring smooth operational workflows across infrastructure and applications.
Crafting and maintaining GitLab CI/CD pipelines to automate build, test, and deployment workflows.
Monitoring system health, building/maintaining dashboards, creating alerts, and producing operational reports.
Performing user offboarding, access reviews, and compliance-related tasks across multiple systems.
Drive interactions with various IT subsystems, ensuring API performance and integration stability meet defined SLAs and SLOs.
Coordinating changes and releases between engineering, operations, and security teams.
Enforcing security guidelines, managing vulnerability remediation, and collaborating with security teams on audits and assessments.
Maintaining documentation, SOPs, and process improvements to enhance operational maturity.
Requirements
8+ years of hands-on experience building/supporting complex services
BS/MS in Computer Science (or equivalent experience)
Knowledge in Python for automation, data handling, and tool development
Experience with monitoring tools (such as Prometheus, Grafana, Datadog, CloudWatch, Splunk)
Familiarity with ITSM practices, including incident, problem, and modification processes
Ability to perform secure and compliant offboarding and access-related tasks
Strong understanding of IT operations and system workflows
Senior Engineer Cloud Engineering role focused on AWS migration and automation. Collaborating with teams to innovate cloud patterns and infrastructure best practices.
Lead or Senior DevOps Developer joining Boeing Defense, Space and Security for advanced technology missions. Involves CI/CD, cloud systems design, and collaboration with government customers.
Site Reliability Engineer ensuring high availability and performance for digital platforms in retail. Collaborating with engineering teams for automation and observability practices.
Associate Site Reliability Engineer supporting the reliability and performance of global IT infrastructure at Exegy. Engage with senior engineers and learn foundational systems engineering skills.
Site Reliability Engineer driving innovation and growth for Banking Solutions, Payments, and Capital Markets business. Responsible for application reliability and incident response in a hybrid work environment.
DevSecOps role at Tiime ensuring implementation of security practices in products. Collaborate with teams for cloud security and incident management in a hybrid workspace.
Senior Site Reliability Engineer responsible for designing reliable infrastructure supporting Fixify's SaaS platform. Collaborating with product engineering teams and maintaining operational standards for infrastructure performance.
DevOps Engineer working with critical infrastructure systems for Swedish internet services. Focused on building and managing robust systems and contributing to automation and operational improvements.
DevSecOps Consultant integrating security into IT development and operational processes. Advising clients on seamless integration of security requirements into DevOps workflows.
DevOps Engineer designing, developing and supporting programs at Swift, the leading provider of secure financial messaging services. Involves system analysis, program development and team collaboration.