Google solved an Old RNN Problem
Google Research introduces 'Memory Caching,' a technique that addresses the long-standing limitation of RNNs losing information over long sequences. Instead of relying on a single fixed-size memory state, the approach splits sequences into segments and saves the RNN's memory state at each segment boundary. During generation, each token attends to all saved checkpoints, achieving O(NL) complexity — a middle ground between RNNs' O(L) and Transformers' O(L²). Four variants are proposed: Residual Memory, Gated Residual Memory (GRM), Memory Soup, and Sparse Selective Caching (SSC), with GRM performing best. The technique significantly closes the recall gap between RNNs and Transformers and shows that hybrid architectures are implicitly a special case of Memory Caching. Experiments are at academic scale (up to 1.3B params), so frontier-scale performance remains an open question.