Today, we're introducing the Gemini 3.1 Flash Text-to-Speech (TTS) model, our latest TTS model, available on Google AI studio and Vertex AI. It delivers precise controllability and expressivity, empowering developers and enterprises to build advanced AI-speech applications.

Google Cloud Platform provides a suite of cloud computing services for building, deploying, and managing applications and infrastructure on Google's global network. Developers can learn about cloud-native development, machine learning, and big data analytics to leverage GCP's scalable and reliable cloud infrastructure for their projects.

Google Cloud

Google has launched Gemini 3.1 Flash TTS, a new text-to-speech model available on Google AI Studio and Vertex AI in public preview. The model supports 70+ languages, 30 prebuilt voices, and over 200 audio tags that can be embedded directly into text prompts to control pacing, expression, and vocal style. Tags like [whispers], [panic], [slow], and [long pause] are inserted inline to steer delivery with granular precision. The post covers the core prompting framework, common tags, and practical use cases including audiobooks, gaming accessibility descriptions, banking fraud alerts, and automated flight notifications. Audio output is watermarked with SynthID for AI content identification.

Gemini 3.1 Flash TTS on Google Cloud

3. The core prompting framework for audio tags