Dia is a 1.6B parameter text-to-speech model developed by Nari Labs that generates ultra-realistic dialogue in one pass. It allows conditioning on audio for emotion and tone control and can produce nonverbal communications. Pretrained model checkpoints and inference code are available on Hugging Face, and there is a demo comparing it with ElevenLabs Studio and Sesame CSM-1B. The model runs on GPUs with future support for CPUs and comes with detailed installation instructions. Ethical and legal guidelines are provided for its use.
Table of contents
⚡️ Quickstart⚙️ Usage💻 Hardware and Inference Speed🪪 License⚠️ Disclaimer🔭 TODO / Future Work🤝 Contributing🤗 AcknowledgementsSort: