Dive into Grab’s engineering journey to optimise a core ML model. Learn how we built the Triton Server Manager and used Triton Inference Server (TIS) to achieve a 50% reduction in tail latency and seamlessly migrate over 50% of online deployments.

Grab is a leading technology company in Southeast Asia, offering a wide range of services, including ride-hailing, food delivery, and digital payments. Through its platform, Grab provides convenience, accessibility, and reliability to millions of users across the region. From their blog developers can learn from Grab's innovative approach to technology and business, gaining insights into building scalable, customer-centric platforms that address real-world challenges and improve people's lives.

Grab Tech Blog

Grab migrated their ML model serving platform Catwalk to NVIDIA Triton Inference Server, achieving 50% reduction in tail latency and 20% cost savings. The team built a Triton Manager component to ensure backward compatibility and zero-downtime migration, successfully transitioning over 50% of online deployments within 10 days. Triton's multi-framework support, unified API, and advanced features like dynamic batching addressed performance issues from maintaining multiple legacy inference engines across ONNX, PyTorch, and TensorFlow.

Modernising Grab’s model serving platform with NVIDIA Triton Inference Server