A RocksDB unit test added four years ago to verify the quality of random number sources eventually exposed a hardware bug in AMD CPUs. The test, which generated thousands of unique 128-bit identifiers using std::random_device across many threads, began failing intermittently. Investigation revealed that the RDSEED instruction on certain AMD processors returns 0 with a 'success' flag far more often than expected under heavy memory load on specific cores. This only affected libstdc++ (GCC), not libc++ (clang). AMD acknowledged the issue, assigned it a high-severity CVE, and announced a CPU microcode fix. The story highlights the value of testing dependencies, building redundancies, and the fact that even CPUs can have subtle bugs.
Table of contents
Background: Unique IdentifiersHigh Quality RandomnessTrust But VerifyThat’s WeirdRoot Cause AnalysisWith ApologiesKey TakeawaysSort: