Senior DevOps managing data platform workloads on AWS. Collaborating on Data Mesh architecture and optimizing data pipelines in a hybrid work environment.
Responsibilities
Manage, capacity plan, and operate workloads utilizing EC2 clusters via DataBricks/EMR to ensure efficient and reliable data processing
Collaborate with stakeholders to design and implement a Data Mesh architecture across multiple closely related but separate enterprise entities
Utilize Infrastructure as Code (IaC) tools such as CloudFormation or Terraform to define and manage data platform user access to data and compute resources.
Implement role-based access control (RBAC) mechanisms using IaC templates to enforce least privilege principles and ensure secure access to data and compute resources
Collaborate with cross-functional teams to design, implement, and optimize data pipelines and workflows
Utilize distributed engines such as Spark to process and analyze large volumes of data efficiently when required
Develop and maintain operational best practices for Spark and other data warehousing tools to ensure system stability and performance
Implement and manage storage technologies to efficiently store and retrieve data as per business requirements
Troubleshoot and resolve platform-related issues in a timely manner to minimize downtime and disruptions
Stay updated on emerging technologies and industry trends to continuously enhance the data platform infrastructure
Document processes, configurations, and changes to ensure comprehensive system documentation.
Requirements
Knowledge of one or more of the following: **AWS CloudFormation and Terraform **for infrastructure provisioning
Knowledge of the source control and its related concepts (**Gitlab/Git flow, Trunk-based, branches,** etc.).
Familiarity with at least one programming language (**Python, Bash, **etc.).
Familiarity with a distributed compute engine such as** Spark**
Familiarity with a data platform or data orchestration tool such as **Databricks/Airflow**
Equipped with in-depth working knowledge and experience in using **AWS IAM, VPC, EC2, RDS, DynamoDB, DMS,** and **S3**
Experience with CI/CD tools (such as **Jenkins, TeamCity, AWS CodePipeline, CodeDeploy**) or configuration management tools (such as Ansible, Chef, Puppet..)
DevOps mindset with automation and operational excellence in mind
Good skills in English and the ability to communicate effectively with business and technical teams
Demonstrate good logical thinking and problem-solving skills
**Be curious and have a self-learning attitude**
Big Plus:
AWS Data Engineer Associate or DevOps Professional Certifications
You are:
Passionate about technology
Independent but also a team player
Comfortable with a high degree of ambiguity
Focused on usability and speed
Keen on presenting your ideas to your peers and management.
Benefits
Meal and parking allowances are covered by the company.
Full benefits and salary rank during probation.
Insurances such as Vietnamese labor law and premium health care for you and your family.
SMART goals and clear career opportunities (technical seminar, conference, and career talk) - we focus on your development.
Values-driven, international working environment, and agile culture.
Overseas travel opportunities for training and work-related.
Internal Hackathons and company events (team building, coffee run, etc.).
Pro-Rate and performance bonus.
15-day annual + 3-day sick leave per year from the company.
DevOps Engineer for designing and maintaining Azure - based hybrid cloud infrastructure for a company specializing in nature - based smart city solutions. Leading cloud architecture and mentoring engineers as part of a high - impact team.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.