The growing impact of expensive large language model outages demands a return to architectural basics in order to maintain resilience.

InfoWorld is a source of news, analysis, and commentary on technology trends, IT strategies, and business innovation. With a focus on enterprise technology and digital transformation, InfoWorld offers insights and guidance for IT decision-makers, software developers, and technology professionals. From  articles on cloud computing and cybersecurity to product reviews and industry trends, InfoWorld helps readers navigate the complexities of modern IT environments and make informed decisions to drive business success.

InfoWorld

Enterprises are rapidly adopting cloud-hosted LLMs but are neglecting fundamental architectural resilience principles. Major LLM outages in 2025 caused billions in losses and exposed how centralized AI dependencies create systemic risk. Three key remediation steps are outlined: auditing LLM dependency chains to identify hidden single points of failure, implementing graceful degradation patterns with fallback mechanisms (local models, rules-based systems, caching), and conducting regular simulation drills and live failover tests to verify resilience before a crisis occurs.

Cloud-based LLMs risk enterprise stability