ChatGPT Advanced Voice Mode isn’t interrupting just you. Interruptions, and turn-taking in general, are unsolved problems for all Voice AI agents. Nobody likes being cut short – and people have much less patience for machines than they do for other humans. Turn-taking failures take many forms (e.g., the agent interrupts the user, the agent mistakes a cough for an interruption), and all of them lead to users immediately hanging up the phone.

In this talk, we use human conversation as a framework for understanding both today’s approaches to turn detection and where the field is headed. You’ll learn about how linguists think about turn detection in human dialogue, what’s working (and what’s broken) in current methods, and how we might build Voice AIs that interrupt you less than your human brother.

About Tom Shapland
Tom Shapland, PhD, is a Product Manager at LiveKit. LiveKit is an open source platform for building, deploying, and scaling realtime multimodal agents. He's passionate about the multimodal future of human-computer interfaces. Before LiveKit, he was the cofounder of a Voice AI observability platform (Canonical AI) and an agriculture technology startup (Tule, YC S14). He lives in the East Bay and coaches lacrosse for his two kids.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter

AI Engineer

Voice AI systems struggle with interruptions because they rely on simple voice activity detection (VAD) that only checks for speech presence and silence duration. Unlike humans who predict conversation endpoints using semantic content, syntax, and prosody in 200 milliseconds, current AI uses basic speech-or-no-speech detection with half-second silence thresholds. New approaches augment VAD with semantic models that analyze conversation context, while full-duplex models process input and generate output simultaneously like human minds. However, production systems still favor enhanced cascading pipelines over full-duplex models for better control and reliability.

Why ChatGPT Keeps Interrupting You — Dr. Tom Shapland, LiveKit