Site Reliability Engineer improving system reliability and performance in production environments with a focus on automation and operational efficiency. Collaborating with engineering and infrastructure teams on deliverable-focused projects.
Responsibilities
Design, develop, test, and deploy automation tools, scripts, and engineering solutions to improve the stability, performance, and efficiency of production systems.
Identify opportunities to automate manual operational processes and reduce operational overhead.
Support and improve the release and deployment lifecycle of applications, ensuring reliable and controlled production rollouts.
Collaborate with software engineers and infrastructure teams to troubleshoot and resolve system issues.
Contribute to system design discussions, platform management, and capacity planning.
Create and maintain clear technical documentation for automation tools, operational procedures, and reliability improvements.
Provide regular updates on progress and deliverables to engineering stakeholders.
Requirements
At least 1 year of professional software development or reliability engineering experience
Proficiency in one or more programming languages such as Python, C++, Java, or shell scripting
Strong understanding of Linux operating system internals
Solid knowledge of networking concepts and troubleshooting
Experience with modern version control systems such as Git
Familiarity with monitoring, logging, and CI/CD tools (e.g., Prometheus, Grafana, Splunk, Jenkins, GitLab CI) is highly beneficial.
Ability to work independently, manage priorities effectively, and deliver results with minimal supervision.
Excellent written and verbal communication skills, with the ability to clearly communicate technical topics to engineering stakeholders.
Ability to quickly learn new technologies and tools and work across multiple programming languages and frameworks.
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.