The D. E. Shaw group, a global investment and technology firm, replaced its previous observability platform with ClickHouse to handle high-cardinality metrics across millions of compute workloads on its internal HPC grid. During evaluation, ClickHouse ingested 3.5 million samples per second versus a competitor's 480k, and completed queries in seconds where competitors timed out. The production cluster now handles 530k records per second on average with spikes over 1 million, storing 68 TB of compressed metrics data. The platform enables capacity planning, compute efficiency analysis, and is expanding into distributed tracing via OpenTelemetry with 12.5x compression ratios. Key engineering decisions included preserving InfluxDB line protocol ingestion, building custom backfill tooling, and using materialized views for long-horizon aggregations.
Table of contents
Scaling the metrics platform #Choosing ClickHouse #A high-cardinality observability platform #From infrastructure to business impact #Expanding into tracing and beyond #Getting the most out of ClickHouse #Sort: