A WebRTC expert who built SFUs at Twitch and Discord argues that OpenAI made a mistake by using WebRTC for Voice AI. Key criticisms: WebRTC aggressively drops audio packets to minimize latency (bad for AI prompts where accuracy matters more), has no meaningful buffering for TTS audio, requires complex port-per-connection hacks at scale, and needs 8+ round trips to establish a connection. The author proposes WebSockets as a simpler starting point, then QUIC/WebTransport as the proper long-term solution. QUIC advantages covered include: 1-RTT connection setup, CONNECTION_ID-based routing that survives IP changes, stateless load balancing via QUIC-LB (encoding backend server ID into the connection ID), and anycast+unicast topology for zero-load-balancer architectures.
Table of contents
MeProduct FitWebRTC is too aggressiveTTS is faster than real-timePorts Ports PortsHacks by NecessityRound Trips and UForking the ProtocolBut What Instead?Connection IDStateless Load BalancingAnycast + UnicastSummaryTo Be FairMeSort: