Uber built HiveSync, a sharded batch replication system that synchronizes 350PB of Hive/HDFS data across multiple regions. The system processes 5 million daily Hive events and replicates 8PB of data, using event-driven architecture, DAG-based orchestration, and hybrid RPC/DistCp strategies. HiveSync separates control and data

3m read timeFrom infoq.com
Post cover image

Sort: