Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure de

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

Uber redesigned its MySQL infrastructure by replacing external failover mechanisms with MySQL Group Replication (MGR), a Paxos-based consensus protocol. The new architecture embeds leader election and failure detection directly in the database layer using three-node MGR clusters, reducing failover time from minutes to under 10 seconds. Scalable read replicas fan out from secondaries to separate read scaling from write availability. Flow control within MGR prevents replication lag and errant GTIDs. The rollout was automated with a control plane handling onboarding, node replacement, topology rebalancing, and quorum protection. Trade-offs include a slight increase in write latency (hundreds of microseconds), but write unavailability during failures dropped dramatically. Single-primary mode was chosen over multi-primary for operational simplicity.

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture