L3 Storage SRE role at Morgan Stanley focusing on infrastructure stability and reliability. Responsibilities include deploying storage solutions and automating operational tasks.
Responsibilities
Work with industry leading technologies in a complex and challenging environment
Provide level 3 escalations & on-call support for a large global distributed production environment
Support the production environment during outage scenarios
Deploy new distributed and storage infrastructure including new data center build outs
Interact with engineering teams on the testing and certification for new hardware and software products
Provide operational insights and requirements for new developments
Proactively work on health & hygiene tasks to maintain stability
Identify opportunities for automation and create automated solutions
Contributed to internal documentation to improve operational effectiveness
Work directly with 3rd party industry leasing vendors (NetApp, Broadcom, IBM/Lenovo, Dell, Veritas)
Requirements
3+ years of hands on experience and working knowledge in at least two of the following technologies: Linux administration, NAS technologies (NFS, SMB), Block Storage Array technologies, SAN Fabric and switches, Storage volume manager (VxVM/LVM or similar), Software defined storage / Cloud technologies
Knowledge of Unix and Storage protocol protocols such as NFS, SMB, Fiber channel TCP/IP and network technologies (NFS, CIFS, DNS, iSCSI)
Fluency in Python for building automation tools
A practitioner of Infrastructure as code and able to constantly evaluate and pursue opportunities to automate tasks
High levels of motivation, initiative and a proactive or 'self-starter' approach to work is essential
Benefits
Enriching challenges that provide opportunity for constant learning and advancement
Professional development opportunities including access to Morgan Stanley’s world-class internal trainings
A supportive and vibrant multinational environment, we accept individual differences and believe in teamwork
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.