The post discusses the training dynamics of diffusion models and the problems caused by weight and activation growth. It introduces a remedy for eliminating weight and activation growth and controlling learning rate decay. The findings are applicable not only to diffusion models but also to other neural networks.
Table of contents
What are training dynamics and why do they matter?Taking control of weight and activation magnitudesThe remedyExponential moving averagesResults and conclusionsSort: