Cloudflare has released an experimental `@cloudflare/voice` package for its Agents SDK, enabling real-time voice interactions over WebSockets. Developers can add continuous speech-to-text (STT) and text-to-speech (TTS) to existing Durable Object-based agents in roughly 30 lines of server-side code. The pipeline integrates Workers AI providers (Deepgram Flux, Nova 3, Aura) with no external API keys required, supports React hooks and a framework-agnostic VoiceClient, and includes a Twilio adapter for phone call handling. The architecture keeps voice as just another input channel to the same agent, preserving SQLite-backed conversation history, tool access, and scheduling across text and voice modalities. Provider interfaces are intentionally minimal to allow third-party STT, TTS, and telephony integrations.

12m read timeFrom blog.cloudflare.com
Post cover image
Table of contents
Get started with voiceHow the voice pipeline worksWhy voice should grow with the rest of your agentLower latency comes from...A more realistic backendVoice as an input: withVoiceInputVoice and text on the same connectionWhat else can you build?Pipeline hooksTelephone and transport optionsBuild with usTry it now

Sort: