Dave uses a PDP-11 to train a real Neural Network complete with Transformers and Attention so you can see them at their most basic.

Get the shirt: https://www.teepublic.com/t-shirt/25242239-digital-equipment-corporation-1957

Dave's Garage's resource offers insights, tutorials, and resources for technology enthusiasts and DIYers. Readers can learn about electronics, hardware hacking, and DIY projects. With tutorials, project guides, and community forums, Dave's Garage provides resources for makers and hobbyists interested in tinkering with technology.

Dave's Garage

A hands-on demonstration of training a minimal single-layer, single-head transformer on a genuine 1979 PDP-11/44 minicomputer. The project, called Attention11, is written in raw PDP-11 assembly language and uses fixed-point arithmetic instead of floating point. The task is simple — learning to reverse an 8-digit sequence — but it exposes the full mechanics of transformer training: forward pass, softmax, loss calculation, backpropagation, and weight updates. With only 1,216 parameters and fitting in 32KB of memory, the model converges to 100% accuracy in about 350 training steps (~3.5 minutes on the 11/44). The piece demystifies modern AI by showing that the core learning loop is pure arithmetic — making guesses, measuring error, and nudging weights — and argues that hardware constraints force better engineering thinking.

Training a Neural Network on a Vintage PDP-11 from 1979!

<p>Really clear and informative introduction to transformers and neural networks in general. I didn’t know I needed a 60-years old YouTuber explaining ML to me with humor and wit, but here I am on my way to subscribe to the guy’s content</p>