Research from Carnegie Mellon University demonstrates that diffusion models outperform autoregressive models in data-constrained scenarios. While autoregressive models excel with limited compute, diffusion models show superior performance when data is the bottleneck, exhibiting better resistance to overfitting and ability to benefit from repeated training data up to 100 epochs. The study establishes new scaling laws and provides a practical guideline: use autoregressive models when compute-constrained, diffusion models when data-constrained.

10m read timeFrom blog.ml.cmu.edu
Post cover image

Sort: