By making self-summarization part of Composer's training, we can get training signal from trajectories much longer than the model's max context window.

Cursor

Cursor's Composer model is trained to handle long-horizon coding tasks via a technique called self-summarization, integrated directly into its reinforcement learning training loop. When Composer approaches its context limit, it pauses to generate a condensed summary of its own context before continuing. Because this summarization is part of training, the model learns to retain only the most critical information. Compared to a heavily engineered prompt-based compaction baseline, Composer's self-summarization reduces compaction error by 50% while using one-fifth of the tokens. As a case study, an early Composer 2 checkpoint solved a challenging Terminal-Bench 2.0 problem (compiling Doom for MIPS) over 170 turns, self-summarizing 100,000+ tokens down to ~1,000 tokens multiple times. The team sees this as a stepping stone toward multi-agent coordination and even longer task horizons.

Training Composer for longer horizons · Cursor

# Self-summarization as a trained behavior