A hands-on demonstration of training a minimal single-layer, single-head transformer on a genuine 1979 PDP-11/44 minicomputer. The project, called Attention11, is written in raw PDP-11 assembly language and uses fixed-point arithmetic instead of floating point. The task is simple — learning to reverse an 8-digit sequence — but it exposes the full mechanics of transformer training: forward pass, softmax, loss calculation, backpropagation, and weight updates. With only 1,216 parameters and fitting in 32KB of memory, the model converges to 100% accuracy in about 350 training steps (~3.5 minutes on the 11/44). The piece demystifies modern AI by showing that the core learning loop is pure arithmetic — making guesses, measuring error, and nudging weights — and argues that hardware constraints force better engineering thinking.

22m watch time
2 Comments

Sort: