Parler-TTS introduced two new text-to-speech models: a lightweight Parler-TTS Mini v0.1 and a high-quality Parler-TTS Large v1. These models use natural language descriptions to control speech aspects like gender, background noise, and speaking rate. Key advancements include automatic labeling of large datasets and a decoder-only Transformer architecture. The models demonstrate significant improvements in generating high-fidelity speech. The post also provides a step-by-step guide for inference and fine-tuning on custom datasets.
Sort: