Building a conversational voice AI bot with sub-second response latency using Modal's serverless platform, the Pipecat framework, and open-source models. The implementation achieves ~1 second voice-to-voice latency by combining Parakeet STT, Qwen3 LLM with vLLM, and Kokoro TTS. Key optimizations include using Modal Tunnels to

14m read timeFrom modal.com
Post cover image
Table of contents
Conversational Voice AI ApplicationsWhy Modal and Pipecat work so well together for Voice AIVoice-to-Voice LatencyBuilding a Conversational Voice AI for Modal’s DocsTesting PerformanceDeploy Your Own Conversational Voice AI on ModalBonus: Animating Modal’s Mascots Moe and Dal

Sort: