A visual, interactive walkthrough of Andrej Karpathy's 200-line Python GPT implementation trained on 32,000 human names. Covers every core concept step by step: character-level tokenization, the sliding-window prediction task, softmax and cross-entropy loss, backpropagation through a scalar computation graph, token and
•10m read time• From growingswe.com
Table of contents
The datasetNumbers, not lettersThe prediction gameFrom scores to probabilitiesMeasuring surpriseTracking every calculationFrom IDs to meaningHow tokens talk to each otherThe full pictureLearningMaking things upEverything else is efficiencySort: