Agent capability scales with the number of steps and tools it can use, but this creates compounding challenges: failures become increasingly likely and iteration slows dramatically. A taxonomy of five agent capability levels (Reflexive, Conversational, Orchestrated, Coordinated, Autonomous) is presented, with a 'complexity cliff' between L2 and L3 where standard frameworks break down. Above the cliff, restarting from scratch becomes prohibitively expensive, dangerous due to side effects, or impossible. Durable Execution — as implemented by Temporal — addresses both failure recovery and iteration velocity by persisting every step immutably, enabling replay from any point in execution history, and allowing history branching for parallel experiments without re-running completed work. Concrete examples show up to 91% reduction in steps needed for prompt tuning experiments.
Table of contents
The problem: Agent capability is a function of time and tools #The complexity cliff: Where most agent frameworks fail #Levels of agent capability #How Durable Execution unlocks agent capability #Sort: