OpenAI recently outlined how it adapted WebRTC for low-latency voice AI at global scale. The new architecture replaced a conventional media termination model with a relay-transceiver design better sui

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

OpenAI engineers Yi Zhang and William McDonald describe how they redesigned their WebRTC infrastructure to support low-latency voice AI at global scale. Instead of conventional direct UDP exposure or TURN-style relays, they split responsibilities into two layers: a lightweight stateless relay that forwards packets, and a dedicated transceiver that owns all stateful WebRTC machinery (ICE negotiation, DTLS handshakes, SRTP encryption). This separation keeps complexity concentrated in one place, avoids large public port ranges in Kubernetes, and scales better for predominantly 1:1 user-to-model sessions — unlike SFU architectures designed for multi-party conferencing. The design underpins ChatGPT voice and the Realtime API.

OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale