Cloud Engineer developing and managing cloud infrastructure solutions for Ford Credit Services. Collaborating with engineering teams to optimize high-performance cloud deployments.
Responsibilities
Design, automate and manage a highly available and scalable cloud deployment that allows development teams to deploy and run their services.
Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions, also leveraging scalability, high-performance and security.
Design and implement sustainable cloud and platform services.
Build a robust, scalable and stable infrastructure.
Manage hosting external containers in Private cloud.
Extensively automated deployments and managed applications in GCP.
Developing and maintaining cloud solutions in accordance with best practices.
Ensuring efficient functioning of data storage and processing functions in accordance with company security policies and best practices in cloud security.
Collaborate with Engineering teams to identify optimization strategies, help develop self-healing capabilities
Experience in developing a strong observability capabilities
Identifying, analysing, and resolving infrastructure vulnerabilities and application deployment issues.
Regularly reviewing existing systems and making recommendations for improvements.
Requirements
Proven work experience in designing, deploying and operating mid to large scale public cloud environments.
Proven work experience in Docker/Kubernetes (image building, k8s schedule)
Experience in package, config and deployment management via Helm, Kustomize, ArgoCD.
Proven working experience in onboarding and troubleshooting Cloud Services.
Proven work experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise or community edition.
Proven work experience in writing custom terraform providers/plug-ins with Sentinel Policy as Code
Professional Certification is an advantage
Public Cloud >> GCP is a good to have.
Strong knowledge in Github, DevOps (Cloud Build is an advantage)
Should be proficient in scripting and coding, that include traditional languages like Python, PowerShell, GoLang,Java, JS and Node.js.
Proven working experience in Messaging Middleware - Apache Kafka, RabbitMQ, Apache ActiveMQ
Proven working experience in API gateway, Apigee is an advantage.
Proven working experience in API development, REST.
Proven working experience in Sec and IAM, SSL/TLS, OAuth and JWT.
Extensive knowledge and hands-on experience in Grafana and Prometheus micro libraries.
Exposure to Cloud Monitoring and logging.
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
Experience with automation tools should be a priority
Previous success in technical engineering
Must have > 5 overall experience
Must have >3 years of experience in public cloud
Must have >3 years of experience in Cloud Infrastructure provisioning
Must have >3 years of experience in Cloud Engineering
Must have >3 years of coding/automation experience(with Python/golang/shell)
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.