See more: https://x.com/adrgrondin/status/2040512861953270226

Speaker info:
- https://x.com/adrgrondin

AI Engineer

A talk by the developer of Locally AI demonstrating how to run Gemma 4 on iPhone using Apple's MLX framework, achieving 40 tokens per second on the latest iPhones. Covers the MLX Swift LM GitHub repo for iOS/macOS integration, sourcing quantized models (4-bit to 8-bit) from the MLX Community on Hugging Face, and practical tips on model selection for on-device inference. Also mentions tool calling support, the broader MLX ecosystem (VLM, audio, video), and the recent acquisition of Locally AI by LM Studio.

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI