I wrote 84 new matmul kernels to improve llamafile CPU performance.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

LLaMa Now Goes Faster on CPUs: New matrix multiplication kernels have been developed for llamafile, resulting in improved performance for prompt evaluation time. The improvements are most noticeable for certain weights and specific CPU types such as ARMv8.2+, Intel Alderlake, and AVX512. The new kernels outperform MKL for matrices that fit in L2 cache and offer potential for faster evaluation speed.

LLaMA Now Goes Faster on CPUs