MegaTTS3 by Bytedance is a lightweight and efficient text-to-speech (TTS) model with only 0.45B parameters. It supports high-quality voice cloning, bilingual (Chinese and English) speech synthesis, and accent intensity control. Users can download pre-trained models, use command-line tools for inference, and access a web UI. The

5m read timeFrom github.com
Post cover image
Table of contents
Key features🎯RoadmapInstallationInferenceSubmodulesSecurityLicenseCitation
10 Comments

Sort: