Google has launched Gemini 3.1 Flash TTS, a new text-to-speech model available on Google AI Studio and Vertex AI in public preview. The model supports 70+ languages, 30 prebuilt voices, and over 200 audio tags that can be embedded directly into text prompts to control pacing, expression, and vocal style. Tags like [whispers], [panic], [slow], and [long pause] are inserted inline to steer delivery with granular precision. The post covers the core prompting framework, common tags, and practical use cases including audiobooks, gaming accessibility descriptions, banking fraud alerts, and automated flight notifications. Audio output is watermarked with SynthID for AI content identification.

6m read timeFrom cloud.google.com
Post cover image
Table of contents
1. Model overview2. Voice style instructions3. The core prompting framework for audio tags

Sort: