A single-layer, single-head transformer implemented in PDP-11 assembly language, trained to reverse sequences of digits. The project explores running a minimal transformer on 1970s hardware within 32KB of core memory. Key engineering challenges include replacing floating-point operations with fixed-point arithmetic (Q8 forward, Q15 backward, Q16 accumulators), using precomputed lookup tables for exp and log functions, and hand-tuning per-layer learning rates instead of Adam to avoid memory overhead. The optimized model converges in 350 training steps, completing in just 5.5 minutes on a real PDP-11/34A. The implementation includes NN11, a custom fixed-point neural network stack organized like BLAS, and was prototyped in a functional ML framework called Sheaf before being committed to assembly. A cycle-accurate PDP-11/34 emulator and WebAssembly demo are also provided.

11m read timeFrom github.com
Post cover image
Table of contents
ArchitectureOptimizing for 1970 HardwarePrototypeImplementation DetailsBuildingRunning

Sort: