Flipkart built Altair, an internally managed MySQL service that maintains high availability for 150+ million daily users through automated failover and primary-replica architecture. The system uses a three-layered monitoring approach (agent, monitor, orchestrator) to detect failures, prevent false positives, and execute failovers with minimal data loss. Altair prioritizes write availability over strong consistency using asynchronous replication, implements DNS-based service discovery for seamless failovers, and includes multiple safeguards against split-brain scenarios. The design balances operational simplicity with reliability, achieving near five-nines availability while managing thousands of database clusters across Flipkart's microservices infrastructure.
Table of contents
Rust rewrites, trends, and what’s next for Rust at P99 CONF (free + virtual) (Sponsored)Help us Make ByteByteGo Newsletter BetterHigh Availability Model at FlipkartEnd-to-End Failure WorkflowService DiscoverySplit-Brain RisksFailure Scenarios and How Altair Handles ThemDesign Highlights and Trade-OffsConclusionSPONSOR USSort: