Blog:  https://kyutai.org/next/stt
Blog:  https://kyutai.org/next/tts
GitHub: https://github.com/kyutai-labs/delayed-streams-modeling
Colab: https://dripl.ink/QZevZ

For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen

🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes

👨‍💻Github:
https://github.com/samwit/llm-tutorials

⏱️Time Stamps:
00:00: Intro
00:43 Kyutai
00:59 Kyutai STT
01:40 Kyutai TTS
05:22 Kyutai TTS Demo

Sam Witteveen AI is a publication offering insights, tutorials, and resources for artificial intelligence (AI) enthusiasts and practitioners. Readers can learn about machine learning algorithms, deep learning frameworks, and AI applications. With tutorials, case studies, and expert interviews, Sam Witteveen AI provides  guidance and expertise for building and deploying AI solutions.

Sam Witteveen

Kyutai has released separate speech-to-text and text-to-speech models that offer low latency voice processing for English and French. The TTS model is only 1.6B parameters and performs competitively with commercial solutions like 11 Labs. While the models support voice cloning through embeddings, the voice embedding model itself isn't released for ethical reasons. Users can blend existing voice embeddings to create new voices, but cannot generate embeddings from custom audio samples. The models show promise for local voice applications but are currently limited by language support and the restricted voice cloning capability.

Kyutai STT & TTS - A Perfect Local Voice Solution?

<p>I would want to hook this up to my home assistant!</p>