Researchers at Percepta demonstrate that a transformer can act as a full computer by implementing a WebAssembly interpreter inside transformer weights, enabling arbitrary C programs to execute for millions of steps entirely within the model's inference loop — no external tools required. The key innovation is restricting attention head dimensions to 2D, which reframes attention lookups as convex-hull queries solvable in logarithmic time rather than linear scans over the full KV cache. This 'Exponentially Fast Attention' path reduces per-step decoding cost from O(n) to O(log n), making long execution traces practical. Demos include solving the world's hardest Sudoku and running the Hungarian algorithm for min-cost matching at over 30k tokens/sec on CPU. Future directions include hybrid fast/slow architectures, compiling programs directly into weights, and growing AI systems incrementally like software libraries.

22m read timeFrom percepta.ai
Post cover image
Table of contents
TL;DRMotivation: LLMs cannot computeHow we turned LLMs to computersWhat does computation mean?More demos: SudokuHow can computation be encoded?The key unlock: Exponentially Fast AttentionSo what is next?Closing thoughts

Sort: