Tigris, an S3-compatible object storage service, used Antithesis — a deterministic fault-injection simulator — to find three cache coherence bugs that standard CI couldn't catch. All three bugs shared the same root cause: a window between a metadata operation committing in FoundationDB and the edge cache reflecting that change. The bugs included a delete-then-read race (deleted objects still served from cache), rename operations failing when cache invalidation timed out, and deleted objects resurfacing under regional failure. The fixes introduced a three-layer defense: eager cache invalidation before FDB commits on the write path, tombstone barriers that block stale re-population on the read path, and metadata-layer tombstone returns so gateways can reject stale cache entries. Over roughly nine months, the Antithesis setup ran 261 workload runs totaling 73,178 virtual hours across 20.3 million unique system states. No cache coherence bugs have been observed since the fixes landed.
Table of contents
What we were testing before, and what we weren't Three bugs, one seam The fix: reorder, then barrier Three layers, one property What's next Sort: