Huge thanks to KiwiCo for sponsoring today’s video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first monthly crate or for 20% off your first Panda Crate!

New Book Available for Preorder Now! The Welch Labs Illustrated Guide to AI (30:47): 
http://www.welchlabs.com/resources/ai-book

Sections 
0:00 - Intro
3:43 - AlexNet & Overfitting
5:19 - Overfitting
6:45 - Rethinking Generalization
11:05 - KiwiCo is Awesome
12:28 - The Double Descent Hypothesis
13:57 - Double Descent is Real!
16:01 - Double Descent with Polynomial Curvefitting?!
20:36 - But why?
22:35 - Should I throw out my books?
24:28 - The Bias-Variance Tradeoff
28:30 - My take
30:47 - I’ve written a new book on AI!

Books with U-shaped test set error curves:
Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT press, 2022.
Goodfellow, Ian, et al. *Deep learning*. Vol. 1. No. 2. Cambridge: MIT press, 2016.
Russell, Stuart Jonathan, and Peter Norvig, eds. *Prentice Hall series in artificial intelligence*. Englewood Cliffs, NJ:: Prentice Hall, 1995.
Bishop, Christopher M., and Nasser M. Nasrabadi. *Pattern recognition and machine learning*. Vol. 4. No. 4. New York: springer, 2006.
Learning, Machine. "Tom mitchell." *Publisher: McGraw Hill* (1997): 31.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "The elements of statistical learning." (2009).
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "An introduction to statistical learning." (2009).
Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. *Learning from data*. Vol. 4. New York: AMLBook, 2012.
MacKay, David JC. *Information theory, inference and learning algorithms*. Cambridge university press, 2003.

Harvard Team’s code & results: 
https://gitlab.com/harvard-machine-learning/double-descent

Great repo showing polynomial double descent: 
https://github.com/RylanSchaeffer/Stanford-AI-Alignment-Double-Descent-Tutorial

Technical Notes
- 26:25 For these linear fits, we’re using N=15 instead of N=5 points. This increases the bias and reduces the variance of these fits, making the bias variance trade-off more clear, but also pushes out the interpolation threshold. Full results are here: https://github.com/stephencwelch/manim_videos/blob/master/_2025/generalization/Final Video Polynomial Examples.ipynb
- 27:38 It’s tricky to show the full bias-variance results here, as the variance explodes ad Degree=4. Instead we’ve chosen to show qualitative breakdowns, showing which terms dominate the overall error at each degree. Full results can be seen here: https://github.com/stephencwelch/manim_videos/blob/master/_2025/generalization/Final%20Video%20Polynomial%20Examples.ipynb

Special Thanks to Patrons https://www.patreon.com/welchlabs
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich, Mitch Jacobs, Lauren Steely, Jeff Eastman, Rodolfo Ibarra, Clark Barrus, Rob Napier, Andrew White, Richard B Johnston, abhiteja mandava, Burt Humburg, Kevin Mitchell, Daniel Sanchez, Ferdie Wang, Tripp Hill, Richard Harbaugh Jr, Prasad Raje, Kalle Aaltonen, Midori Switch Hound, Zach Wilson, Chris Seltzer, Ven Popov, Hunter Nelson, Amit Bueno, Scott Olsen, Johan Rimez, Shehryar Saroya, Tyler Christensen, Beckett Madden-Woods, Darrell Thomas, Javier Soto, U007D, Caleb Begly, Rick Rubenstein, Brent Hunsaker, Dan Patterson, Tchsurvives, Alex Adai, Walter Reade, Zyansheep, Walter Reade, Duncan Stannett, Reginald Carey, Jean-Manuel Izaret, dh71633, Adrian Rodriguez, Dimitar Stojanovski, Michael Harder, Peter Maldonado, Emily Pesce, David Johnston, Insang Song, FaeTheWolf, Stephen Taylor, KittenKaboodle, EMatter, PATRICKMCCORMACK, John Beahan, Cameron, Cole Jones, Garrett Thornburg, Jeroen W, Rohit Sharma, GlennB, Emmanuel Cortes, Katie Quinn, Karina C, Cakra WW, Mike Ton, Eric Gometz, MacCallister Higgins, Niko Drossos, David Eraso, Tom Zehle, Steve, Brian Lineburg, rjbl, Michael Loh, Perry Vais, Bengal0, Farhad Manjoo, Sara Chipps, Ellis Driscoll, William Taysom, Will Harmon, CK, Abdullah, Peter Cho, Leo Nikora, Griffin Smith, Ash Katnoria, Alex, Markus Hays Nielsen

Special thanks to: Mikhail Belkin, Preetum Nakkiran, Emily Zhang, Varun Reddy

Code for Welch Labs Videos: https://github.com/stephencwelch/manim_videos

Written by: Stephen Welch 
Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu

Premium Beat IDs
EEDYZ3FP44YX8OWT

Welch Labs

Double descent challenges the traditional bias-variance tradeoff taught in machine learning textbooks. While classical theory predicts test error increases as models grow beyond optimal size (the U-shaped curve), research from 2018-2019 revealed that sufficiently large models can achieve lower test error even while perfectly fitting training data. This phenomenon occurs because overparameterized models have flexibility to choose smoother, lower-norm solutions that generalize better, contradicting the assumption that fitting training data too well necessarily causes poor generalization. Deep learning models demonstrate this behavior consistently, explaining why massive neural networks generalize well despite having capacity to memorize training sets.

What the Books Get Wrong about AI [Double Descent]