Deep Dive into Apache Iceberg Architecture: The Three Layers That Power Your Lakehouse In my previous post, “Why Apache Iceberg Is Not ‘Just Another Table Format’”, I explored the strategic …

Snowflake Community is a platform for users of the Snowflake cloud data platform to share knowledge, ask questions, and collaborate. Readers can learn about cloud data warehousing, data analytics, and data engineering best practices. With forums, user groups, and community events, Snowflake Community provides resources for Snowflake users to connect and learn from each other.

Snowflake Community

A detailed breakdown of Apache Iceberg's three-layer architecture: the Catalog Layer (thin pointer to current metadata, enabling atomic commits), the Metadata Layer (metadata files, manifest lists, and manifest files enabling hierarchical query pruning), and the Data Layer (open-format Parquet files in cloud object storage). Explains how these layers work together to deliver ACID guarantees, time travel, schema evolution, and petabyte-scale query performance. Covers catalog implementations (Snowflake Open Catalog/Polaris, AWS Glue, Nessie, Hive Metastore), column-level statistics for file pruning, and the Copy-on-Write vs Merge-on-Read trade-off for deletes.

Deep Dive into Apache Iceberg Architecture: The Three Layers That Power Your Lakehouse

2a. Metadata Files (The Root of Each Snapshot)

How the Layers Work Together: A Query Example