Dia is a 1.6B parameter text-to-speech model developed by Nari Labs that generates ultra-realistic dialogue in one pass. It allows conditioning on audio for emotion and tone control and can produce nonverbal communications. Pretrained model checkpoints and inference code are available on Hugging Face, and there is a demo

3m read timeFrom github.com
Post cover image
Table of contents
⚡️ Quickstart⚙️ Usage💻 Hardware and Inference Speed🪪 License⚠️ Disclaimer🔭 TODO / Future Work🤝 Contributing🤗 Acknowledgements

Sort: