How Airbnb Runs Distributed Databases on Kubernetes at Scale

Airbnb deployed distributed SQL databases across multiple Kubernetes clusters, each mapped to a different AWS Availability Zone, to achieve high availability and fault tolerance. They built custom Kubernetes operators to safely manage stateful workloads, coordinate node replacements, and maintain quorum during failures. Using AWS EBS for persistent storage, PVCs for volume management, and techniques like replica reads and stale reads, they mitigated latency issues while maintaining consistency. Their largest production cluster handles 3 million queries per second across 150 nodes with 300TB of data, achieving 99.95% availability through careful sequencing of upgrades, canary deployments, and overprovisioning for resilience.

#aws

#kubernetes

#database

#distributed-systems

Oct 01, 2025•14m read time•From blog.bytebytego.com

Table of contents

Stop Agent Hallucinations with Project Rules (Warp University) (Sponsored)Help us Make ByteByteGo Newsletter Better Running Databases on Kubernetes Node Replacement Coordination Kubernetes Upgrades Multi-Cluster Deployment for Fault Tolerance Leveraging AWS EBS Conclusion SPONSOR US