The Hugo evolution: Engineering Grab's unified, one-click data ingestion platform with Apache Flink

Grab's engineering team shares how they evolved Hugo, their internal self-service data platform, by replacing a fragmented toolchain (Kafka Connect, Sprinkler, Spark) with Apache Flink as a unified ingestion engine. The new architecture supports one-click MySQL CDC pipelines and self-service Kafka ingestion into Hive tables via S3. Key improvements include reducing onboarding time from days to minutes (Kafka ~6 min, MySQL CDC ~3 min), automated schema detection replacing manual Protobuf-to-Avro mappings, and eliminating intermediary Kafka hops for CDC. Adoption has surged: new pipelines onboarded in the past year exceed the total from the previous five years. Future plans include Apache Iceberg table format adoption and zero-touch schema evolution.

#backend

#data-engineering

#apache-kafka

#apache-flink

#change-data-capture

Today•7m read time•From engineering.grab.com

Table of contents

Introduction Background The siloed past: A multi-platform hurdle The Hugo evolution: A unified ingestion platform Impact Summary What’s next Join us

Comment

Bookmark

Copy

Sort: