We stopped relying on bloom filters and now sort our ClickHouse primary key on a resource fingerprint. It cut our log query scans to 0.85% of blocks
A development team optimized ClickHouse log query performance by replacing bloom filter skip indexes with a deterministic resource fingerprint approach. They sort their primary key on a hash of cluster, namespace, and pod information, which groups logs from the same source together. This change reduced block scanning from nearly 100% to just 0.85% (222 out of 26,135 blocks) for single namespace queries, significantly improving I/O and latency. The team is now exploring ClickHouse's native JSON column type to further optimize GROUP BY operations.