Jaeger v2.18.0 introduces native ClickHouse support as an alpha storage backend for distributed traces. The author, a Jaeger maintainer, explains why ClickHouse's columnar OLAP architecture suits telemetry workloads: benchmarks on 10 million spans across 1 million traces show 8.6× compression (reducing ~6 GiB to ~722 MiB), ingestion throughput above 50k spans/sec, trace retrieval around 100 ms, and most search queries under 50 ms. Key schema decisions include sorting by (service_name, name, start_time) rather than trace_id to optimize search performance, using a bloom filter skip index and a materialized view on trace_id timestamps to recover trace retrieval speed, storing typed attributes via ClickHouse Nested columns, and using materialized views to precompute service names, operations, and trace time ranges. The integration also enables native Service Performance Monitoring (SPM) by computing latency, call rates, and error rates directly from stored spans without an external metrics pipeline.
Sort: