A detailed breakdown of Apache Iceberg's three-layer architecture: the Catalog Layer (thin pointer to current metadata, enabling atomic commits), the Metadata Layer (metadata files, manifest lists, and manifest files enabling hierarchical query pruning), and the Data Layer (open-format Parquet files in cloud object storage). Explains how these layers work together to deliver ACID guarantees, time travel, schema evolution, and petabyte-scale query performance. Covers catalog implementations (Snowflake Open Catalog/Polaris, AWS Glue, Nessie, Hive Metastore), column-level statistics for file pruning, and the Copy-on-Write vs Merge-on-Read trade-off for deletes.
Table of contents
2a. Metadata Files (The Root of Each Snapshot)2b. Manifest Lists (The Snapshot Index)2c. Manifest Files (The File Registry)How the Layers Work Together: A Query ExampleSort: