Long-running AI agents that operate over hours, days, or weeks represent a significant shift from single-session chat-based agents. Three core engineering challenges must be solved: finite context windows, lack of persistent state between sessions, and unreliable self-verification. The post surveys how Anthropic, Cursor, and Google have converged on similar architectural patterns — separating the model loop (brain) from execution sandboxes (hands) and durable session logs — while differing in surface area and productization. Practical patterns covered include checkpoint-and-resume, delegated human approval, memory-layered context, ambient processing, and fleet orchestration. The Ralph loop (a simple bash-based task iteration pattern) is presented as a minimal viable implementation. Key unsolved challenges include cost control, security, alignment drift over many context windows, and the human skill of writing precise enough specs for autonomous execution.
Table of contents
What “long-running” actually meansWhy this mattersThe three walls every long-running agent hitsThe Ralph loop: one of the simpler practitioner versions of long-running agentsAnthropic: harnesses, then the brain/hands/session splitCursor: planners, workers, judgesGoogle: long-running agents on the Agent PlatformFive patterns for long-running agents in productionSo how do you actually build one today?There are some real limitations right nowWhere this is goingSort: