Redpanda's engineering team details how they applied Profile-Guided Optimization (PGO) and BOLT to Redpanda Streaming, achieving ~50% reduction in p50 latency and ~15% lower CPU utilization. The post explains the mechanics of PGO (two-phase compilation with instrumented profiling) vs BOLT (post-link binary optimizer), why they chose PGO over BOLT for production, and uses Top-Down Microarchitecture Analysis (TMA) with Linux perf to show that Redpanda was heavily frontend-bound (51%) before PGO, dropping to ~38% after. Binary heatmaps visualize how PGO concentrates hot code paths, improving instruction cache locality and reducing iTLB pressure. PGO ships in the 26.1 release.
Table of contents
Profile-guided optimization and BOLT #Benchmark: lower latencies, less CPU usage #Analyzing PGO performance improvements #Try Redpanda Streaming #Sort: