Research from Carnegie Mellon University demonstrates that diffusion models outperform autoregressive models in data-constrained scenarios. While autoregressive models are more compute-efficient, diffusion models show superior data efficiency, handling up to 100 epochs of repeated data without overfitting compared to autoregressive models that plateau around 4 epochs. The study establishes new scaling laws and provides a practical guideline: use autoregressive models when compute is limited, but choose diffusion models when data is the bottleneck.
Sort: