Ollama is previewing a new backend powered by Apple's MLX framework for Apple Silicon Macs, delivering significantly faster inference speeds. The update introduces NVFP4 quantization support for higher model accuracy with lower memory usage, and improved KV caching with smarter eviction and intelligent checkpoints for coding and agentic workloads. The preview accelerates the Qwen3.5-35B-A3B model optimized for coding tasks and works with tools like Claude Code and OpenClaw. A Mac with more than 32GB of unified memory is required.

3m read timeFrom ollama.com
Post cover image
Table of contents
Fastest performance on Apple silicon, powered by MLXNVFP4 support: higher quality responses and production parityImproved caching for more responsivenessGet startedFuture modelsAcknowledgments

Sort: