How Flipkart Built a Highly Available MySQL Cluster for 150+ Million Users
Flipkart built Altair, an internally managed MySQL service that maintains high availability for 150+ million daily users through automated failover and primary-replica architecture. The system uses a three-layered monitoring approach (agent, monitor, orchestrator) to detect failures, prevent false positives, and execute failovers with minimal data loss. Altair prioritizes write availability over strong consistency using asynchronous replication, implements DNS-based service discovery for seamless failovers, and includes multiple safeguards against split-brain scenarios. The design balances operational simplicity with reliability, achieving near five-nines availability while managing thousands of database clusters across Flipkart's microservices infrastructure.