We Ran 250 AI Agent Evals to Find Out if Skills Beat Docs. The Answer Is More Complicated Than We Expected

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Wix Engineering ran 250 controlled evaluations comparing AI agent performance using standard docs, agent-optimized docs, and purpose-built skills. Key findings: optimizing docs alone improved CLI task completion from 67% to 87% while cutting token usage by 35%. Skills outperformed docs only when accurate and well-maintained, but small errors (misaligned scaffolding, broken code snippets) erased their advantage entirely. For REST API tasks, docs-optimized runs were 31% faster despite skills using fewer tokens, due to MCP tool fragmentation causing more sequential calls. An unexpected finding: skills made agents less exploratory, constraining solution space. The recommended framework treats agent-optimized docs as the backbone and skills as a caching layer for common tasks, with regular evals to detect drift.

#llm

#ai-agents

#nocode

#mcp

May 06•8m read time•From wix.engineering

Table of contents

The Problem We Were Trying to Solve Methodology What We Found A Framework for Docs and Skills Conclusion

Comment

Bookmark

Copy

Sort: