unknown

Microsoft released VibeVoice, an open-source text-to-speech model that can generate up to 90 minutes of conversational audio with up to 4 distinct speakers. The model uses continuous speech tokenizers at 7.5 Hz and a next-token diffusion framework combining LLM understanding with diffusion-based acoustic generation. Available in 1.5B and 7B parameter versions on Hugging Face, it supports cross-lingual synthesis and can spontaneously generate background music. The model is designed for research purposes and includes installation instructions, demo examples, and usage guidelines.

microsoft/VibeVoice: Frontier Open-Source Text-to-Speech

Join The Githubers! A community where we discover, share, and discuss the most useful, interesting, and innovative GitHub repositories across various domains. From dev tools to open-source projects

The Githubers

<p>Yeah the link takes us to nowhere 😞</p>