A deep investigation into a counterintuitive performance regression where Rust's O3 optimization level produces code 2x slower than O2. The author discovers that LLVM's aggressive use of conditional moves (CMOV) instead of conditional jumps in a binary search implementation creates CPU pipeline dependencies that severely degrade performance. Through flamegraphs, assembly analysis, and CPU simulation tools like uiCA, the article demonstrates how compiler optimizations can backfire when they introduce instruction-level dependencies that bottleneck execution, despite eliminating branch mispredictions.
Table of contents
What the hell am I doing?Benchmarking is hardGoing deeperAnd deeper…… And deeperSkill issue?What’s now?1 Comment
Sort: