Uber has replaced its batch-based data lake ingestion with IngestionNext, a streaming-first platform built on Apache Kafka, Flink, and Apache Hudi. The new system reduces data ingestion latency from hours to minutes and cuts compute usage by roughly 25%. Key engineering challenges included managing small file proliferation in
Sort: