This post discusses the challenges of handling frequent updates in a data lake and introduces the Hudi format as a solution. It explains the configurations optimized for high and low throughput sources and how to connect to Kafka and RDS data sources. It also highlights the importance of indexing for Hudi tables and the impact

7m read time From engineering.grab.com
Post cover image
Table of contents
IntroductionHigh throughput sourceLow throughput sourceConnecting to our Kafka (unbounded) data sourceConnecting to our RDS (bounded) data sourceIndexing for Hudi tablesImpactWhat’s next?References

Sort: