How to build real-time voice AI on Android - streaming STT, LLM, TTS pipeline, barge-in interrupt, and the race conditions nobody talks about.

Pand's Blog provides developers with tutorials, tips, and insights on web development, frontend technologies, and software engineering practices. Developers can learn about modern web development frameworks, frontend build tools, and responsive design techniques, as well as explore tutorials on JavaScript libraries and CSS frameworks. Additionally, the blog covers topics such as web accessibility, performance optimization, and user experience design, empowering developers to create intuitive and accessible web applications.

ProAndroidDev

A deep-dive into building production-quality real-time conversational voice AI on Android, going far beyond the basic STT→LLM→TTS pipeline. Covers a five-state machine (IDLE, LISTENING, THINKING, SPEAKING, ERROR) using Kotlin StateFlow and collectLatest for clean cancellation, streaming STT via Deepgram WebSocket with AudioRecord using VOICE_COMMUNICATION source for hardware AEC, a session ID race condition fix using AtomicInteger, WebSocket pre-warming to eliminate handshake latency, sentence-level TTS streaming to reduce perceived latency to 600–800ms, WebRTC VAD-based barge-in detection with debouncing to prevent false triggers, a single shared AudioRecord instance to avoid device-specific restart failures, and backchannel filler phrases for slow inference. Measured full-turn latency is 1.2–1.6s with barge-in response under 150ms.

Engineering Real-Time Conversational Voice AI on Android