Redpanda's engineering team details how they applied Profile-Guided Optimization (PGO) and BOLT to Redpanda Streaming, achieving ~50% reduction in p50 latency and ~15% lower CPU utilization. The post explains the mechanics of PGO (two-phase compilation with instrumented profiling) vs BOLT (post-link binary optimizer), why they chose PGO over BOLT for production, and uses Top-Down Microarchitecture Analysis (TMA) with Linux perf to show that Redpanda was heavily frontend-bound (51%) before PGO, dropping to ~38% after. Binary heatmaps visualize how PGO concentrates hot code paths, improving instruction cache locality and reducing iTLB pressure. PGO ships in the 26.1 release.

9m read timeFrom redpanda.com
Post cover image
Table of contents
Profile-guided optimization and BOLT #Benchmark: lower latencies, less CPU usage #Analyzing PGO performance improvements #Try Redpanda Streaming #

Sort: