As the data lake landscape matures over the years, it presents opportunities to unlock more business value from the data. This correlates with the increased demand for flexible ad-hoc usage of fresh data. This article explores how we implemented data ingestion in Hudi table formats using Flink to meet this business demand.

Grab is a leading technology company in Southeast Asia, offering a wide range of services, including ride-hailing, food delivery, and digital payments. Through its platform, Grab provides convenience, accessibility, and reliability to millions of users across the region. From their blog developers can learn from Grab's innovative approach to technology and business, gaining insights into building scalable, customer-centric platforms that address real-world challenges and improve people's lives.

Grab Tech Blog

This post discusses the challenges of handling frequent updates in a data lake and introduces the Hudi format as a solution. It explains the configurations optimized for high and low throughput sources and how to connect to Kafka and RDS data sources. It also highlights the importance of indexing for Hudi tables and the impact of the Hudi Data Ingestion solution on business metrics and fraud detection.

Enabling near real-time data analytics on the data lake

Connecting to our Kafka (unbounded) data source

Connecting to our RDS (bounded) data source