Add voice to your agent

Cloudflare has released an experimental `@cloudflare/voice` package for its Agents SDK, enabling real-time voice interactions over WebSockets. Developers can add continuous speech-to-text (STT) and text-to-speech (TTS) to existing Durable Object-based agents in roughly 30 lines of server-side code. The pipeline integrates Workers AI providers (Deepgram Flux, Nova 3, Aura) with no external API keys required, supports React hooks and a framework-agnostic VoiceClient, and includes a Twilio adapter for phone call handling. The architecture keeps voice as just another input channel to the same agent, preserving SQLite-backed conversation history, tool access, and scheduling across text and voice modalities. Provider interfaces are intentionally minimal to allow third-party STT, TTS, and telephony integrations.

#cloudflare

#speech-recognition

#text-to-speech

Apr 15•12m read time•From blog.cloudflare.com

Table of contents

Get started with voice How the voice pipeline works Why voice should grow with the rest of your agent Lower latency comes from...A more realistic backend Voice as an input: withVoiceInput Voice and text on the same connection What else can you build?Pipeline hooks Telephone and transport options Build with us Try it now

Comment

Bookmark

Copy

Sort: