ASLP-lab/DiffRhythm: Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
DiffRhythm is an open-source diffusion-based AI model that generates full-length songs up to 4 minutes and 45 seconds. The model supports text-to-music generation, instrumental mode, and reference audio input. It requires 8GB VRAM minimum and offers multiple versions including a latest v1.2 release with improved audio quality and reduced repetition issues. The project provides Docker deployment, local installation guides, and is available on Hugging Face with Apache 2.0 licensing.