Design, develop, and implement intelligent automation and AI-driven solutions at Securonix. Focus on enhancing reliability and efficiency across SaaS Ops environments with AI integration.
Responsibilities
Design and deploy GenAI-enabled workflows that support proactive incident detection, RCA summarization and resolution automation.
Integrate LLM-based copilots (e.g., ChatGPT, Claude, or internal GPT models) with SaaSOps tools for smart troubleshooting and impact summaries.
Implement automation logic to handle Spark job failures, pipeline restarts, and alert remediation without human intervention.
Build and maintain end-to-end automations across monitoring and incident management systems (Jira, Slack, ServiceNow, CriticalMon, SparkOpsSuite).
Orchestrate recurring SaaSOps tasks using tools like AWX, Airflow, or Python-based automation frameworks.
Collaborate with engineering and product teams to enhance telemetry from Spark, EKS, and multi-tenant pipelines.
Use GenAI to generate incident summaries, health reports, and RCA narratives in real time.
Identify high-impact manual SaaSOps workflows suitable for automation or AI augmentation.
Requirements
5–8 years of experience in SaaSOps, Cloud Operations, or IT Automation.
Strong understanding of Spark, EKS, AWS infrastructure, and SaaS application operations.
Proven experience with Python scripting and REST API integrations.
Hands-on experience with automation/orchestration tools such as AWX, Airflow, or UiPath.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.