Convex experienced a major outage affecting T3 Chat on June 1, 2025, caused by three interconnected problems: search index compaction triggering unnecessary query invalidations, inadequate client backoff logic creating a thundering herd effect, and operational tooling that accidentally downgraded hardware resources during the incident. The outage lasted over 5 hours, with T3 Chat being unusable for nearly 3 hours. The root cause was search indexing compaction invalidating thousands of subscribed queries simultaneously, overwhelming the system when combined with aggressive client reconnection behavior and reduced server capacity.

9m read timeFrom news.convex.dev
Post cover image
Table of contents
Outage ContextImpactOutage TimelineRoot Cause AnalysisFollow-Up Action Items

Sort: