Better-Harness is a system for iteratively improving AI agent harnesses using evaluations as a learning signal. The approach treats evals like training data in classical ML, using them to guide autonomous harness updates through a loop of sourcing evals, splitting into optimization/holdout sets, running baselines, optimizing,
Table of contents
Evals are training data for AgentsSourcing good evalsBetter-Harness: a recipe for hill climbing your harnessExamples of harness changesResults from the Better-Harness loopEvals maintenance & regressionsThe Future: automated error detection & fixesSort: