NVIDIA introduces three new Riva TTS models that advance multilingual speech synthesis and voice cloning capabilities. Magpie TTS Multilingual supports four languages with streaming encoder-decoder architecture, Magpie TTS Zeroshot enables voice cloning from 5-second samples, and Magpie TTS Flow targets studio applications with 3-second voice samples. All models use preference alignment and classifier-free guidance to improve text adherence and reduce audio artifacts. The models achieve superior performance in character error rates and naturalness compared to open source alternatives while requiring less training data.

9m read timeFrom developer.nvidia.com
Post cover image
Table of contents
Streaming encoder-decoder transformerMagpie TTS FlowSafety collaborationsGet started with NVIDIA Riva Magpie TTS models

Sort: