Platform Engineer developing API and CI/CD environments for machine learning at CADDi, improving developer productivity and system reliability.
Responsibilities
Build API and batch execution environments for running machine learning model inference, as well as deployment environments using CI/CD.
Implement monitoring, performance tuning, and other improvements to enhance site reliability in production environments.
Optimize the cost of inference and training platforms.
Create the deployment and operation processes based on input from the modeling and platform teams.
Actively experiment with new ML, MLOps, and infrastructure tools, quickly validating ideas through proof-of-concepts and applying what works to real company products.
Besides the team we are recruiting for this time, you may be assigned to other teams depending on your experience and preferences. (In that case, we would be happy to discuss this with you at the interview.)
After joining the company, your role may change due to organizational growth or an individual's career perspective.
Requirements
5+ years of professional experience as a software engineer
Experience leading software development
We especially value experience leading design, development, operations, and the necessary communication involved in roles such as team lead or project driver, regardless of project size
At least 2 years of experience in ONE or more of the following:
Developing shared platforms or backend systems using cloud infrastructure.
Designing and operating Machine Learning systems (MLOps) with consideration for latency, cost, and non-functional requirements.
Developing Generative AI applications using LLMs, RAG architectures, and Vector Databases.
Hands-on experience with statically typed programming languages (such as TypeScript, Rust, Java/Kotlin, Go, etc)
General understanding of the core Computer Science concepts behind AI (such as Vector Space, Embeddings, or Inference) and the ability to leverage these principles to build and integrate AI-driven features into software platforms.
Experience in development using public cloud platforms such as AWS, Google Cloud, etc.
Fluent business communication skills in English, able to complete daily tasks in English, including text communication and meetings.(CEFR B1 or Higher level)
Must currently reside in Vietnam or have plans to relocate. Foreign nationals must also hold a valid Vietnam work permit or be legally eligible to work in Vietnam.
Experience developing machine learning pipelines using tools such as Vertex AI Pipelines, Kubeflow, Apache Beam, or Spark
Familiarity with at least one ML/AI framework such as scikit-learn, PyTorch, or TensorFlow.
Development experience related to MLOps or SRE
Experience collaborating with ML engineers to continuously improve and deliver machine learning and data science models
Experience building and operating systems such as Data Lakes or Feature Stores
Experience implementing initiatives to improve data quality for data-centric ML model improvement
Experience planning and driving data utilization initiatives—internally or externally—using tools such as BigQuery or Redash
Basic knowledge of algorithms related to machine learning, statistics, linear algebra, and computer science
Experience working with Scrum or Agile methodologies.
Conversational-level Japanese proficiency(Japanese Language Proficiency Test N2 or above is a guideline)
Benefits
Hybrid (come to Office at least once a week)
Remote (depending on the case, and limited to those who can go on business trip due to Company orders)
Office address:
HCMC: 7F, Gia Loc Building, No. 27-29 Nguyen Cuu Van Street, Ward 17, Binh Thanh District, HCMC
Hanoi: Unit 9.03, 9F, The West Building, 265 Cau Giay Street, Cau Giay Ward, Hanoi
Official full-time employee
Probation period: 2 months
Annual paid leave: 12 days
National holidays
Year-end holidays (December 31 to January 3)
Tet holidays
Others (following Labor Regulations)
13th month salary
Salary review: twice a year
100% monthly basic salary and mandatory social insurances in 2-month probation
Premium Health Insurance
Social insurance, health insurance, unemployment insurance, workers’ accident compensation insurance
Annual health check-up
Allowances such as: child-care allowance, commuting allowance, life event congratulatory gift, etc
Growth support such as subsidy for server fee, support for attending external training courses
Intensive training program (external or internal training courses, workshop etc)
AI ML Engineer at global networking leader, shaping ML strategy and building high - performance systems. Innovating with AI technology to enhance network management and develop flagship products.
Staff Machine Learning Engineer developing the next generation of AI Agent OS and SDKs for GEICO. Key responsibilities include architecting scalable systems and implementing observability frameworks.
Senior Staff Machine Learning Engineer leading technical architecture for GEICO's AI Agent Platform. Driving innovation and enhancing productivity for internal associates and customers.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.
Architect designing enterprise - grade AI/ML architectures for Quantiphi. Leading AI applications and ML strategy with a focus on scalability, security, and integration.