The Infinite Context LLM Trick? Recursive Language Model Explained
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Recursive Language Models (RLM) propose treating the LLM context window as an external environment rather than a place to stuff tokens. Instead of loading a full document into context, the root model receives only constant-size metadata and writes Python code to probe a persistent variable holding the content. Sub-calls handle isolated local reasoning over slices, returning only summaries to the root, which acts as an orchestrator. This avoids context rot (performance degradation with longer inputs) and outperforms RAG on dense reasoning benchmarks like ULong, where GPT-5 alone scores under 0.1% while RLM with GPT-5 reaches 58%. RLM is not a replacement for RAG in speed-sensitive workloads, but for high-value tasks requiring global reasoning over large artifacts, it offers a fundamentally better architecture. The paper benchmarks RLM at up to 10 million token inputs without loading them into the model context.
Sort: