StableAvatar is an end-to-end video diffusion transformer that generates infinite-length, high-quality audio-driven avatar videos without post-processing. It addresses the main limitation of existing models - audio modeling issues that cause latent distribution drift in long videos - through a Time-step-aware Audio Adapter and

Sort: