DevOps role managing IT infrastructure with a focus on AWS and Azure at an innovative tech company. Collaborate on projects and optimize cloud operations.
Responsibilities
Implement and manage IT infrastructure, including servers, storage systems and networks (AWS, Google Cloud and Azure).
Use tools for infrastructure provisioning.
Implement topologies with high availability, fault tolerance and scalability.
Implement environment monitoring tools and alerting.
Install and configure management systems for web servers, databases, jobs and basic pipelines in continuous delivery tools using CI/CD process automation.
Work on process automation projects.
Provide support and troubleshoot existing IT infrastructure.
Provide training and guidance to other team members on DevOps practices.
Requirements
Bachelor's degree (completed).
Knowledge of the AWS Well-Architected Framework pillars and ITIL methodology.
Knowledge of log monitoring tools and operating systems (Linux and Microsoft).
Advanced knowledge of continuous delivery tools using CI/CD process automation and infrastructure as code; experience with Azure DevOps and Jira.
Experience with code repositories and APM tools.
Ability to analyze issues, identify root causes, and produce incident, performance and configuration reports.
Knowledge of other cloud providers (Google Cloud and Microsoft Azure).
Manage IT infrastructure, including servers, storage systems and networks.
Advanced expertise in Cloud Computing and knowledge of the AWS Well-Architected pillars.
Develop and manage automated CI/CD pipelines using multiple stacks (Git, Jenkins, CodePipeline, CodeBuild, Azure DevOps, etc.).
Advanced knowledge in building and provisioning environments using IaC (Infrastructure as Code) and associated solutions (HashiCorp Terraform and AWS CloudFormation).
Ability to analyze issues, identify root causes, and produce incident, performance and configuration reports.
Advanced knowledge in building and maintaining clusters using containers and orchestration across multiple stacks (Docker, Kubernetes, ECS, EKS).
Implement topologies and solutions for high availability, fault tolerance and automatic scalability.
Document environment architectures.
Configure and recommend add-ons for monitoring, logging, authorization and networking (Prometheus, Thanos, kube2iam, Loki, Elasticsearch, Fluentd, nginx-ingress-controller).
Implement cloud environment monitoring tools and alerting.
Develop scripts for infrastructure automation (Shell scripts, Python, PowerShell, Node).
Self-management: plan short- and long-term activities and deliverables according to team needs.
Be customer-facing: maintain client relationships, lead project status meetings and be proactive.
AWS certification at Specialty or Professional level is a plus.
Knowledge of Python.
Familiarity with monitoring/logging tools (Prometheus, Elasticsearch, Grafana, Kibana, New Relic, Datadog, Zabbix).
Kubernetes certification (CKA and/or CKAD) is a plus.
Provide support and troubleshoot existing IT infrastructure.
Provide training and guidance to other team members on DevOps practices.
Knowledge of other cloud providers (Google Cloud and Microsoft Azure).
Benefits
Hybrid (3 days on-site and 2 days remote per week)
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.