Long-running Agents

Long-running AI agents that operate over hours, days, or weeks represent the next evolution beyond single-session chat-based agents. Three core engineering challenges must be solved: finite context windows, lack of persistent state, and unreliable self-verification. Addy Osmani surveys how Anthropic, Cursor, and Google have converged on similar architectural patterns — separating the model loop (brain) from execution sandboxes (hands) and durable session logs — while differing in surface area and productization. Practical patterns covered include the Ralph loop (a simple bash-based task loop), checkpoint-and-resume, human-in-the-loop delegation, memory-layered context, ambient processing, and fleet orchestration. Key takeaways: define done-conditions before the agent starts, separate evaluator from generator, invest in append-only session logs, and treat context resets as first-class operations. Real limitations remain around cost, security, alignment drift, and verification overhead.

#llm

#ai-agents

Apr 28•21m read time•From addyosmani.com

Table of contents

What “long-running” actually means Why this matters The three walls every long-running agent hits The Ralph loop: one of the simpler practitioner versions of long-running agents Anthropic: harnesses, then the brain/hands/session split Cursor: planners, workers, judges Google: long-running agents on the Agent Platform Five patterns for long-running agents in production So how do you actually build one today?There are some real limitations right now Where this is going