Francis-Rings/StableAvatar: We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-procesRead post
StableAvatar is an end-to-end video diffusion transformer that generates infinite-length, high-quality audio-driven avatar videos without post-processing. It addresses the main limitation of existing models - audio modeling issues that cause latent distribution drift in long videos - through a Time-step-aware Audio Adapter and
Sort: