Today, we're previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple's machine learning framework.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Ollama is previewing a new backend powered by Apple's MLX framework for Apple Silicon Macs, delivering significantly faster inference speeds. The update introduces NVFP4 quantization support for higher model accuracy with lower memory usage, and improved KV caching with smarter eviction and intelligent checkpoints for coding and agentic workloads. The preview accelerates the Qwen3.5-35B-A3B model optimized for coding tasks and works with tools like Claude Code and OpenClaw. A Mac with more than 32GB of unified memory is required.

Ollama is now powered by MLX on Apple Silicon in preview · Ollama Blog

Fastest performance on Apple silicon, powered by MLX

NVFP4 support: higher quality responses and production parity