A thought about binary I/O between LLM agents and the tools they call.

/dev/io

When an LLM agent calls a tool like kubectl, data that starts as ~4KB of protobuf ends up as ~12K tokens in the model's context due to text serialization. The root cause is that agent-tool wire protocols are string-typed, and models have no learned representation for binary formats. The practical workaround is piping output through tools like jq to reduce token count. The author argues a typed binary channel between tools and models would address the problem at the source, potentially starting with data-analysis agents using Apache Arrow, but notes the pressure to build it is low because the jq workaround is good enough.

Why do agents always speak text?