SkillsBench is a new benchmark with 86 tasks across 11 domains designed to measure whether Agent Skills—structured procedural knowledge packages—actually improve LLM agent performance. Testing 7 agent-model configurations over 7,308 trajectories reveals that curated Skills boost average pass rates by 16.2 percentage points,
Sort: