Convex experienced a major outage affecting T3 Chat on June 1, 2025, caused by three interconnected problems: search index compaction triggering unnecessary query invalidations, inadequate client backoff logic creating a thundering herd effect, and operational tooling that accidentally downgraded hardware resources during the incident. The outage lasted over 5 hours, with T3 Chat being unusable for nearly 3 hours. The root cause was search indexing compaction invalidating thousands of subscribed queries simultaneously, overwhelming the system when combined with aggressive client reconnection behavior and reduced server capacity.
Sort: