Uber has replaced its batch-based data lake ingestion with IngestionNext, a streaming-first platform built on Apache Kafka, Flink, and Apache Hudi. The new system reduces data ingestion latency from hours to minutes and cuts compute usage by roughly 25%. Key engineering challenges included managing small file proliferation in

3m read timeFrom infoq.com
Post cover image

Sort: