A detailed breakdown of Apache Iceberg's three-layer architecture: the Catalog Layer (thin pointer to current metadata, enabling atomic commits), the Metadata Layer (metadata files, manifest lists, and manifest files enabling hierarchical query pruning), and the Data Layer (open-format Parquet files in cloud object storage). Explains how these layers work together to deliver ACID guarantees, time travel, schema evolution, and petabyte-scale query performance. Covers catalog implementations (Snowflake Open Catalog/Polaris, AWS Glue, Nessie, Hive Metastore), column-level statistics for file pruning, and the Copy-on-Write vs Merge-on-Read trade-off for deletes.

10m read timeFrom medium.com
Post cover image
Table of contents
2a. Metadata Files (The Root of Each Snapshot)2b. Manifest Lists (The Snapshot Index)2c. Manifest Files (The File Registry)How the Layers Work Together: A Query Example

Sort: