A conference talk covering four key patterns for building reliable distributed systems. The speaker explains why naive retries cause 'spiral of death' under load and how token bucket mechanisms limit retry-induced overload to ~1% additional traffic. The fallback pattern is examined critically, with real examples like the OpenAI/ChatGPT outage caused by cache failure cascading to database overload. Load shedding is presented as a way to prioritize high-value traffic by dropping low-priority requests (bots, free-tier users) before timeouts waste server resources. Finally, the constant work pattern is introduced as a way to maintain predictable, deterministic load on downstream systems regardless of external event volume, illustrated with AWS EC2 provisioning and DNS update examples.

33m watch time

Sort: