A detailed walkthrough of modeling the AWS DynamoDB outage race condition using TLA+. The post explains how splitting an enactor's operation into three non-atomic steps (receive, apply, cleanup) created a time-of-check to time-of-update flaw. When one enactor lagged while another moved faster through the steps, a stale plan could overwrite the current one and then be deleted, violating the system invariant. Includes an interactive browser-based trace explorer (Spectacle) for visualizing the violation step-by-step.
Sort: