Andrej Karpathy introduces microgpt, a 200-line pure Python implementation of GPT that requires no dependencies. The project distills the complete algorithmic essence of training and running a GPT model into a single file, covering tokenization, autograd from scratch, the Transformer architecture (attention, MLP, embeddings),

28m read time From karpathy.github.io
Post cover image
Table of contents
DatasetTokenizerAutogradParametersArchitectureTraining loopInferenceRun itProgressionReal stuffFAQ

Sort: