A talk by the developer of Locally AI demonstrating how to run Gemma 4 on iPhone using Apple's MLX framework, achieving 40 tokens per second on the latest iPhones. Covers the MLX Swift LM GitHub repo for iOS/macOS integration, sourcing quantized models (4-bit to 8-bit) from the MLX Community on Hugging Face, and practical tips
•10m watch time
Sort: