A retrospective on pre-LLM AI in observability reveals that anomaly detection, despite heavy vendor investment and marketing, largely failed to deliver practical value — with only 12% of SREs using it regularly in 2021. The core problem: production systems are inherently noisy, generating floods of false positives that cause alert fatigue. Meanwhile, simple statistical methods like P99 latency monitoring and RED metrics consistently outperform ML-based approaches. The few ML features that genuinely proved useful — log pattern grouping and automated threshold setting — became so standard they're no longer marketed as AI. The lesson: flashy AI demos rarely translate to real-world reliability, and solid fundamentals beat hype.
Table of contents
Business ContextCan ML/AI Find Problems for Us?The Principle of Least Power: Statistics such as P99Useful Machine Learning in ObservabilityLearningsSort: