We understand the impact outages have on our customers and are sharing details on the stabilization work we’re prioritizing right now.

The GitHub Blog provides updates, announcements, and insights from the world's leading software development platform, covering topics such as new features, community highlights, and industry trends. Developers can learn about GitHub's latest developments, best practices for collaboration, and tips for maximizing productivity on the platform.

GitHub Blog

GitHub's CTO details three major availability incidents from February 2, February 9, and March 5, explaining root causes including a database cluster overload from compounding load factors (cache TTL change, increased API traffic from client apps, and new model releases), a telemetry gap causing security policies to block VM metadata access for Actions hosted runners, and a Redis failover leaving a cluster with no writable primary. Contributing factors included insufficient architectural isolation, inadequate load shedding, and monitoring gaps. Remediation includes redesigning the user cache system, isolating critical dependencies, accelerating Azure migration (currently at 12.5% of traffic, targeting 50% by July), and breaking apart the monolith into isolated services.

Addressing GitHub’s recent availability issues