MLX is an array framework for Apple Silicon, essentially PyTorch for your Mac, and this is a tour of what it can run: real-time vision models that describe the world around you, sub-100ms text-to-speech, speech-to-speech pipelines, omni models that take image and audio together, and video generation from a text prompt on 16GB of VRAM. A recent breakthrough called Turbo Quant cuts KV cache by 4x and gets 1M context running fully on device. The community projects include a native voice app, a robot speaking in real time with a cloned voice, and a system that chains video generations into a coherent story — all without a cloud call.

The underlying argument: the cloud assumption doesn't hold everywhere. Not for someone in Africa on an unreliable connection. Not for a local agent that needs to stay on. Not for a robot that has to hear, see, and respond without phoning home.

Speaker info:
- https://x.com/Prince_Canuma
- https://pl.linkedin.com/in/prince-canuma

AI Engineer

A conference talk by Prince Canuma (Arcee) covering MLX, Apple's array framework for on-device AI on Apple Silicon. The speaker demonstrates running large vision and language models (including Gemma 4) entirely on MacBooks and iPhones without internet, using MLX VLM for real-time image analysis, MLX Audio for text-to-speech and speech recognition, and a modular speech pipeline. Key highlights include 1.5M downloads and 4,000+ ported models, TurboQuant enabling 1M context windows on-device, and community projects like robotics, video generation, and native Swift apps. The talk is motivated by a personal story about building accessibility tools for a blind family member in a low-connectivity region.

MLX Genmedia — Prince Canuma, Arcee