A research paper proposing a new optimization method for 32-bit unsigned integer division by constants on 64-bit CPUs. The existing Granlund-Montgomery (GM) method, used by GCC, Clang, MSVC, and Apple Clang, generates code optimized for 32-bit CPUs and doesn't fully exploit 64-bit capabilities. The proposed method achieves 1.67x speedup on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 in microbenchmarks. A patch implementing the method has already been merged into LLVM main.
Sort: