A deep-dive into a CUBIC congestion control bug in Cloudflare's quiche QUIC library, where the congestion window became permanently pinned at its minimum floor after heavy packet loss, causing a 'death spiral'. The root cause was a Linux kernel optimization for handling idle periods that was incorrectly ported to user-space QUIC: when cwnd collapsed to two packets, every ACK cycle drove bytes_in_flight to zero, which the code misidentified as application idleness rather than congestion-limited waiting. This inflated the idle delta by a full RTT, pushing the recovery start time into the future and preventing cwnd growth. The fix was to measure idle duration from the last ACK time rather than the last send time, accurately distinguishing true idleness from normal RTT wait periods. The change restored 100% pass rates in the test suite.
Table of contents
CUBIC's logic in a nutshellThe symptom: a test that fails 61% of the timeThe anomaly: 999 state transitions with zero lossTracing the root causeThe fix: measuring idle from the right momentValidationTakeawaysSort: