OpenAI rebuilt its WebRTC stack to handle real-time voice AI at scale for over 900 million weekly active users. The core challenge was that the conventional one-port-per-session WebRTC model doesn't fit Kubernetes infrastructure well. Their solution splits packet routing from protocol termination using a relay plus transceiver architecture: a lightweight stateless relay layer handles UDP forwarding with a small public footprint, while stateful transceivers own all WebRTC session state (ICE, DTLS, SRTP). Routing is achieved by encoding destination metadata into the ICE username fragment (ufrag), enabling deterministic first-packet routing without hot-path lookups. The relay is written in Go using SO_REUSEPORT, thread pinning, and pre-allocated buffers for efficiency. A globally distributed relay fleet (Global Relay) reduces first-hop latency by placing ingress close to users worldwide, with Cloudflare geo-steering for signaling.

14m read timeFrom openai.com
Post cover image
Table of contents
WebRTC lets us make real-time AI productsChoosing a media architectureThe core deployment problem: WebRTC meets KubernetesArchitecture overview: relay + transceiverRouting on ICE credentialsGlobal Relay and geo-steered signalingRelay implementation and performanceResults and learnings

Sort: