A hands-on exploration of AVX-512 SIMD programming through implementing K-Means clustering for image segmentation. The author benchmarks scalar, auto-vectorized, and hand-written intrinsics code, achieving 7-8.5x speedup over scalar (half the theoretical 16x) and 4x faster than compiler auto-vectorization. Compares SIMD's

13m read time From shihab-shahriar.github.io
Post cover image
Table of contents
Benchmark ProblemBaseline(s)AVX-512Final ThoughtsAppendixAppendix 2: LLMs

Sort: