ELLA is a novel method that integrates powerful Large Language Models (LLMs) into text-to-image diffusion models to enhance their capabilities in handling complicated prompts. It introduces the Timestep-Aware Semantic Connector (TSC) for dynamic semantic alignment. ELLA performs superiorly in complex prompt following,

5m read time From marktechpost.com
Post cover image

Sort: