Building production AI agents requires context engineering over prompt engineering. A production failure with Metabot revealed that locally optimized components created contradictory signals in the LLM's context window. Three key patterns emerged: LLM-optimized data representations with structured templates, just-in-time instructions delivered at relevant moments rather than front-loaded in system prompts, and explicit error guidance with recovery paths. Benchmarks often miss real-world chaos where users ask ambiguous questions against messy data. Success requires building for chaos and production reality, not polished happy-path demos.
Table of contents
What we were building (and why it’s hard)What broke: local optimizationWhat worked: context engineering over prompt engineeringThe benchmark problem in AI analytics agentsBuild for chaos, not happy pathsTry it yourselfSort: