ClickHouse Cloud introduces index sharding (distributed index analysis) for SharedMergeTree tables, distributing primary and secondary index analysis across replicas using consistent hashing. Previously, every replica loaded the full index into working memory — at petabyte scale this could mean 100–400 GiB per replica. With index sharding, each replica owns only a 1/N slice of the index, capping total fleet-wide index memory at the size of the index itself regardless of replica count. Benchmarks on a 50 billion row table show up to 7.7× faster index analysis for bloom filter lookups, 7.2× for vector search, and 4.3× for primary key range queries. The feature activates automatically when tables exceed configurable thresholds (default: 10 parts and 1 GB of index data). It is currently available in private preview on ClickHouse Cloud.

14m read timeFrom clickhouse.com
Post cover image
Table of contents
How index analysis works #The bottleneck: indexes don't scale horizontally alongside replicas #Index sharding and the core concepts #What happens when a replica is added to the service? #How does ClickHouse protect itself from failure to analyze? #How does index sharding reduce memory usage on replicas? #How does index sharding increase analysis performance? #What kind of workloads will this benefit the most? #

Sort: