How we built a real-time voice bot on Modal's distributed serverless platform.

Modal

Building a conversational voice AI bot with sub-second response latency using Modal's serverless platform, the Pipecat framework, and open-source models. The implementation achieves ~1 second voice-to-voice latency by combining Parakeet STT, Qwen3 LLM with vLLM, and Kokoro TTS. Key optimizations include using Modal Tunnels to bypass the input plane, WebRTC for client connections, regional pinning to minimize network latency, and independent autoscaling of GPU services. The demo includes a RAG-powered assistant for Modal's documentation with structured outputs and animated avatars.

One-Second Voice-to-Voice Latency with Modal, Pipecat, and Open Models

Why Modal and Pipecat work so well together for Voice AI

Building a Conversational Voice AI for Modal’s Docs

Deploy Your Own Conversational Voice AI on Modal

Bonus: Animating Modal’s Mascots Moe and Dal