Grab's engineering team shares how they evolved Hugo, their internal self-service data platform, by replacing a fragmented toolchain (Kafka Connect, Sprinkler, Spark) with Apache Flink as a unified ingestion engine. The new architecture supports one-click MySQL CDC pipelines and self-service Kafka ingestion into Hive tables via S3. Key improvements include reducing onboarding time from days to minutes (Kafka ~6 min, MySQL CDC ~3 min), automated schema detection replacing manual Protobuf-to-Avro mappings, and eliminating intermediary Kafka hops for CDC. Adoption has surged: new pipelines onboarded in the past year exceed the total from the previous five years. Future plans include Apache Iceberg table format adoption and zero-touch schema evolution.
Table of contents
IntroductionBackgroundThe siloed past: A multi-platform hurdleThe Hugo evolution: A unified ingestion platformImpactSummaryWhat’s nextJoin usSort: