KTransformers 0.5.3 has been released, adding AVX2-only inference support for Mixture of Experts (MoE) models, enabling BF16, FP8, and GPTQ-INT4 workloads on CPUs that lack AVX-512 or AMX (such as Intel Core/Ultra consumer processors). This makes local LLM inferencing viable on a broader range of hardware. The release also includes NUMA-aware deployment improvements for multi-socket environments, lower idle CPU overhead, speculative decode enhancements, and other fixes.

1m read timeFrom phoronix.com
Post cover image

Sort: