Building AI agents locally is trivial, but running them in production at enterprise scale creates massive hidden infrastructure debt. Drawing a parallel to Google's 2015 ML technical debt paper, seven infrastructure blocks surrounding agents are identified: integrations (credential sprawl, API changes), context lake (stale runtime data, missing decision traces), agent registry (invisible agents, no lifecycle management), measurement (observability, evals, ROI, feedback loops), human-in-the-loop (approval workflows, control planes), governance (access control, audit trails, cost limits), and orchestration (non-deterministic handoffs, ownership gaps). The debt compounds as more teams adopt agents independently, often forcing platform teams to retrofit governance after sprawl is already entrenched. The recommendation is to start with a visibility audit and establish working definitions before the pain forces the issue.
Table of contents
1. Integrations2. Context lake3. Agent registry4. Measurement5. Human-in-the-loop6. Governance7. OrchestrationWhen the debt hitsThis happened before with microservicesWhat to do about itSort: