Needle is a 26-million parameter function-call model distilled from Gemini 3.1 using a 'Simple Attention Network' architecture. Designed for edge devices like phones, watches, and glasses, it achieves 6000 tokens/sec prefill and 1200 decode speed. It outperforms larger models like FunctionGemma-270m and Qwen-0.6B on single-shot function calling tasks. Weights are fully open-source, and the model can be finetuned locally via a web UI or CLI. Pretrained on 200B tokens using 16 TPU v6e units, with post-training on 2B tokens of function call data.

3m read timeFrom github.com
Post cover image
Table of contents
QuickstartUsage (Python)FinetuningCLI

Sort: