A visual walkthrough of how machines learn through optimization, starting from basic gradient descent and building up to AdamW. Covers the intuition behind stochastic gradient descent, momentum (moving average of gradients), RMSProp (adaptive learning rates via moving average of squared gradients), and how Adam combines both.
•7m watch time
Sort: