Dia is a 1.6B parameter text-to-speech model developed by Nari Labs that generates ultra-realistic dialogue in one pass. It allows conditioning on audio for emotion and tone control and can produce nonverbal communications. Pretrained model checkpoints and inference code are available on Hugging Face, and there is a demo
Table of contents
⚡️ Quickstart⚙️ Usage💻 Hardware and Inference Speed🪪 License⚠️ Disclaimer🔭 TODO / Future Work🤝 Contributing🤗 AcknowledgementsSort: