The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel's particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point. I don't actually have any inside information, but it's fun to speculate,…

FGiesen's platform is a resource for graphics programmers and computer graphics enthusiasts, offering insights into rendering techniques, GPU programming, and real-time graphics technologies. Through articles, tutorials, and technical discussions, FGiesen offers insights into rendering algorithms, shading languages, and graphics hardware architectures. Readers can learn about graphics rendering pipelines, optimization strategies, and advanced rendering effects to create visually stunning and immersive graphics experiences.

Ryg blog

The x86 instruction set has a range of unique SIMD integer multiply operations, deeply rooted in the architecture since the Pentium MMX era. The post explores the implementation and idiosyncrasies of these operations across different iterations, including MMX, SSE, SSE2, SSSE3, SSE4.1, and AVX-512. It delves into specifics of commands like PMULLW, PMULHW, PMADDWD, and others, detailing their purposes, historical context, and technical intricacies. The discussion also touches on the evolution of these operations and how Intel and AMD have adapted their multiplier designs over time.

Why those particular integer multiplies?

What about AVX-512 VPMULLQ, IFMA or VNNI?