Tool calling with closed-source models is seamless. You pass a list of functions to the API, the model calls them, you get structured JSON back.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Tool calling with open-source LLMs is fragmented because every model family uses a different wire format for encoding function calls. Inference engines like vLLM, SGLang, and TensorRT-LLM each write custom parsers per model, and grammar engines like Outlines and XGrammar independently reverse-engineer the same format knowledge. This creates an M×N duplication problem: N models times M implementations of the same format logic. The proposed solution is a declarative spec that captures each model's wire format — boundary tokens, argument serialization, reasoning token behavior — so both grammar engines and output parsers can consume it without redundant reverse-engineering work every time a new model ships.

Tool calling, open source, and the M×N problem

What “supporting a model” actually means

Generic parsers are swimming against the current