Building a Database on S3
A review of a 2008 research paper that proposed building a relational database on Amazon S3, using SQS as a write-ahead log and S3 as a page store. The design pioneered the storage-compute separation philosophy now central to cloud-native databases like Aurora, Snowflake, and Delta Lake. Key challenges included SQS's non-FIFO delivery requiring idempotent log records, an atomicity protocol for all-or-nothing commits, B-link trees for lock-free reads on stale S3 pages, and weak isolation guarantees where last-writer-wins replaces traditional ANSI SQL isolation. The paper's claim that snapshot isolation requires a centralized counter is noted as outdated given modern hybrid logical clocks. Despite its clunky 2008-era protocols, the paper is credited as a conceptual precursor to modern data lake and lakehouse architectures.