AI coding agents like Claude Code and Codex run multi-turn loops with 10-50+ tool calls, causing HTTP payloads to grow linearly as full conversation history is retransmitted each turn. OpenAI's WebSocket mode for the Responses API (launched Feb 2026) solves this by caching server-side state, so each continuation only sends a reference ID plus new tool outputs. Benchmarks with GPT-5.4 show 82% less client-sent data and 29% faster end-to-end execution versus HTTP. The tradeoff is provider lock-in — WebSocket stateful continuation is currently OpenAI-only, with Anthropic, Google, OpenRouter, and local models still using stateless SSE/HTTP. The article also covers when HTTP remains preferable (few-turn tasks, multi-provider setups, serverless infrastructure) and frames the broader architectural question of whether the industry will standardize stateful LLM continuation.

15m read timeFrom infoq.com
Post cover image
Table of contents
The Airplane ProblemThe Agentic Coding LoopThe HTTP Overhead ProblemWhat Existing Benchmarks ShowOur Benchmark: Validating the ClaimsWhy It's Faster: The ArchitectureArchitectural LessonsWhen HTTP Is Still the Right ChoiceConclusionAbout the Author

Sort: