ClickHouse now supports direct querying of Apache Iceberg and Delta Lake data lake formats without requiring data migration. After two years of engineering, it offers three usage modes: query data in place on S3/GCS/Azure through catalogs like AWS Glue, Unity Catalog, and REST Catalog; accelerate analytics by loading data into ClickHouse's native MergeTree engine for sub-second query performance; and write results back to open formats for interoperability with tools like Spark, Trino, and DuckDB. Key capabilities include full DML support, time travel, schema evolution, partition pruning, and a new native Parquet reader delivering 1.8x faster reads. The post includes SQL examples querying 1.29 billion NYC taxi rows, showing query times drop from ~130 seconds on lake files to ~14 seconds in MergeTree.
Table of contents
Introduction #The road to data lake ready #Three ways to use ClickHouse with your data lake #What’s supported #Conclusion #Get started #Sort: