How a Single Line of Code Made a 24-core Server Slower Than a Laptop. Piotr Kołaczkowski wrote a program for a pleasingly parallel problem, where each thread does its own independent piece of work, and the threads don't need to coordinate except joining the results at the end.
Table of contents
Rune scriptingBenchmarking the benchmarking programRunning an empty loop on 24 coresInvestigationThe problemThe fixFinal resultsTakeawaysSort: