An interactive browser-based visualization of a tiny GPT model explains transformer architecture fundamentals through Q&A format. Covers attention mechanisms, weight matrices, normalization techniques, residual connections, and training dynamics. The minimal model uses 16 dimensions, 4 attention heads, and learns simple

3m read time From microgpt.boratto.ca
Post cover image
Table of contents
questions

Sort: