This post discusses the process of reproducing the GPT-2 (124M) model, including loading the weights, implementing the model from scratch, and generating text. It also introduces the Tiny Shakespeare dataset and shows how to use it for training. The author demonstrates how to calculate loss and perform optimization using PyTorch.
•4h 1m watch time
Sort: