AI SRE tools from vendors like PagerDuty, Datadog, and Microsoft focus on diagnostics and automated mitigation, but none address incident management — the coordination work that keeps a team of responders aligned. The author argues that individual AI agents are inherently prone to fixation (tunnel vision), meaning incident response will always require multiple agents or humans working together. The real unsolved problem is maintaining 'common ground' — keeping all responders synchronized on current hypotheses, actions, and system state — which requires an active, proactive coordination layer that no current AI SRE product attempts to provide.

7m read timeFrom surfingcomplexity.blog
Post cover image
Table of contents
Fixation: the single-agent problemMaintaining common ground is active work

Sort: