26m function call model that runs on incredibly small devices - cactus-compute/needle

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Needle is a 26-million parameter function-call model distilled from Gemini 3.1 using a 'Simple Attention Network' architecture. Designed for edge devices like phones, watches, and glasses, it achieves 6000 tokens/sec prefill and 1200 decode speed. It outperforms larger models like FunctionGemma-270m and Qwen-0.6B on single-shot function calling tasks. Weights are fully open-source, and the model can be finetuned locally via a web UI or CLI. Pretrained on 200B tokens using 16 TPU v6e units, with post-training on 2B tokens of function call data.