this paper from Voltropy shows why agents should stop...

Feb 20•From x.com

Robert Youssef @rryssf_

this paper from Voltropy shows why agents should stop letting models manage their own memory ☠️ the idea is called Lossless Context Management (LCM). and the framing alone is worth your time. Recursive Language Models (RLMs) gave models full autonomy to write their own memory scripts. the model gets a REPL, writes Python loops to chunk and process its own context. maximally flexible. also maximally unpredictable. an efficient chunking script in one rollout becomes a bad one in the next. LCM flips this entirely. instead of asking the model to invent a memory strategy, the engine handles it deterministically. old messages get compressed into a hierarchical DAG of summaries, but every original is preserved verbatim in an immutable store. the model never loses access to anything. it just sees progressively compressed views with stable pointers it can expand on demand. the analogy they use is perfect: GOTO vs structured programming. early programs used unrestricted GOTO for any control flow the programmer wanted. maximally expressive. also impossible to reason about at scale. Dijkstra's critique replaced GOTO with constrained primitives (for, while, if/else) that were less flexible in theory but far more reliable in practice. RLM gives models GOTO-level power over their own context. LCM gives them structured control flow. less expressive. dramatically more predictable. the results back this up. their agent Volt (running Opus 4.6) beats Claude Code on the OOLONG long-context benchmark at every single context length from 32K to 1M tokens. average improvement over raw Opus 4.6: +29.2 for Volt versus +24.7 for Claude Code. at 512K tokens the gap is +42.4 vs +29.8. at 1M it's +51.3 vs +47.0. below 32K both systems perform about the same. that's expected. when the full input fits in context, the architecture doesn't matter much. LCM's zero-cost continuity means it adds no overhead in this regime either. no latency penalty for short tasks. where it gets interesting is how they handle parallelism. instead of the model writing loops to process large datasets, LCM introduces two deterministic operators: LLM-Map (stateless parallel processing, one LLM call per item) and Agentic-Map (full sub-agent per item with tool access). single tool call from the model. the engine handles all iteration, concurrency, retries, and schema validation. Claude Code's approach: the model reads files linearly or writes bash scripts to split and process them. flexible, but the model has to correctly implement chunking logic every time AND maintain coherent state across chunks in its own context window. two sources of error compounding on each other. Volt's approach: the model never sees the raw dataset. it specifies a per-item prompt and output schema. the engine returns aggregated results. context saturation stops being a failure mode entirely. they also solve a problem i haven't seen addressed this cleanly before: infinite delegation. when sub-agents can spawn sub-agents, you risk an agent delegating its entire task downward forever, doing no actual work. LCM enforces a scope-reduction invariant. every sub-agent must declare what work it's delegating AND what work it's keeping. if it can't articulate what it's retaining, the call gets rejected. no arbitrary depth limits needed. the recursion is structurally guaranteed to terminate. the limitations section is honest, which matters. they acknowledge OOLONG has contamination issues (Opus 4.6 sometimes recognizes the underlying dataset from training data). they decontaminated by excluding tasks where reasoning traces showed memorization. the overall finding holds but the gap narrows. they also argue for procedurally generated benchmarks going forward, which is the right call. the deeper implication is one we keep relearning from software engineering history: how you manage what the model sees may matter more than giving the model tools to manage it itself. every agent framework shipping with "let the model figure it out" memory strategies might be building on the wrong abstraction entirely. not because model autonomy is bad. but because deterministic infrastructure solving the common cases reliably is almost always better than stochastic flexibility solving every case unpredictably. less GOTO. more structured control flow. the lesson keeps repeating.

Comment

Bookmark

Copy

Sort: