OpenAI engineers Yi Zhang and William McDonald describe how they redesigned their WebRTC infrastructure to support low-latency voice AI at global scale. Instead of conventional direct UDP exposure or TURN-style relays, they split responsibilities into two layers: a lightweight stateless relay that forwards packets, and a dedicated transceiver that owns all stateful WebRTC machinery (ICE negotiation, DTLS handshakes, SRTP encryption). This separation keeps complexity concentrated in one place, avoids large public port ranges in Kubernetes, and scales better for predominantly 1:1 user-to-model sessions — unlike SFU architectures designed for multi-party conferencing. The design underpins ChatGPT voice and the Realtime API.

3m read timeFrom infoq.com
Post cover image

Sort: