Cursor's Composer model is trained to handle long-horizon coding tasks via a technique called self-summarization, integrated directly into its reinforcement learning training loop. When Composer approaches its context limit, it pauses to generate a condensed summary of its own context before continuing. Because this summarization is part of training, the model learns to retain only the most critical information. Compared to a heavily engineered prompt-based compaction baseline, Composer's self-summarization reduces compaction error by 50% while using one-fifth of the tokens. As a case study, an early Composer 2 checkpoint solved a challenging Terminal-Bench 2.0 problem (compiling Doom for MIPS) over 170 turns, self-summarizing 100,000+ tokens down to ~1,000 tokens multiple times. The team sees this as a stepping stone toward multi-agent coordination and even longer task horizons.

6m read timeFrom cursor.com
Post cover image
Table of contents
# The limits of compaction techniques# Self-summarization as a trained behavior# Token-efficient compaction# Solving hard problems# Toward a long-horizon future
1 Comment

Sort: