Stateful Continuation for AI Agents: Why Transport Layers Now Matter

AI coding agents like Claude Code and Codex run multi-turn loops with 10-50+ tool calls, causing HTTP payloads to grow linearly as full conversation history is retransmitted each turn. OpenAI's WebSocket mode for the Responses API (launched Feb 2026) solves this by caching server-side state, so each continuation only sends a reference ID plus new tool outputs. Benchmarks with GPT-5.4 show 82% less client-sent data and 29% faster end-to-end execution versus HTTP. The tradeoff is provider lock-in — WebSocket stateful continuation is currently OpenAI-only, with Anthropic, Google, OpenRouter, and local models still using stateless SSE/HTTP. The article also covers when HTTP remains preferable (few-turn tasks, multi-provider setups, serverless infrastructure) and frames the broader architectural question of whether the industry will standardize stateful LLM continuation.

#openai

#ai-agents

#websocket

Apr 08•15m read time•From infoq.com

Table of contents

The Airplane Problem The Agentic Coding Loop The HTTP Overhead Problem What Existing Benchmarks Show Our Benchmark: Validating the Claims Why It's Faster: The Architecture Architectural Lessons When HTTP Is Still the Right Choice Conclusion About the Author

Comment

Bookmark

Copy

Sort: