Uber's Compliance Data Store team developed a config-driven archival and retrieval framework to manage terabytes of regulatory data efficiently. The system automatically moves data from hot storage (HDFS) to cold storage (S3) based on configurable policies, while providing on-demand retrieval capabilities. The solution reduced manual intervention by 90% and successfully handles over 500 regulatory reports, addressing challenges like schema evolution, data consistency during backfills, and resource optimization. The framework uses MySQL for metadata management, Apache Airflow for orchestration, and includes a user-friendly interface for self-service data retrieval.
Sort: