A RocksDB unit test added four years ago to verify the quality of random number sources eventually exposed a hardware bug in AMD CPUs. The test, which generated thousands of unique 128-bit identifiers using std::random_device across many threads, began failing intermittently. Investigation revealed that the RDSEED instruction on certain AMD processors returns 0 with a 'success' flag far more often than expected under heavy memory load on specific cores. This only affected libstdc++ (GCC), not libc++ (clang). AMD acknowledged the issue, assigned it a high-severity CVE, and announced a CPU microcode fix. The story highlights the value of testing dependencies, building redundancies, and the fact that even CPUs can have subtle bugs.

5m read timeFrom rocksdb.org
Post cover image
Table of contents
Background: Unique IdentifiersHigh Quality RandomnessTrust But VerifyThat’s WeirdRoot Cause AnalysisWith ApologiesKey Takeaways

Sort: