I Built a Tiny Computer Inside a Transformer
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A hands-on exploration of treating a transformer as a programmable machine rather than a trained system. By assigning hidden-state dimensions to program variables (like CPU registers), wiring attention heads as deterministic lookup tables, and using feed-forward blocks as local compute units, a simple program can be compiled directly into transformer weights without any gradient descent. The residual stream acts as working memory, each layer as a machine step, and slot reuse mirrors register allocation in traditional compilers. The post also discusses geometric speedups for attention lookups via convex-hull structures, and references Percepta's work on compiling a WebAssembly VM into transformer weights for practical deterministic execution inside LLMs.
Table of contents
A Tiny Program We Can Compile into a TransformerOne Machine Step: Attention, FFN, and Write-BackFrom Program Variables to CompilationFrom Computation Graph to WeightsScaling Program Execution to Long Deterministic TracesMaking This All Deterministic in PracticeA New AI Design Pattern: Integrating Learned Representations with Deterministic AlgorithmsFurther ReadingSort: