Railway suffered an ~8-hour platform-wide outage on May 19, 2026 after Google Cloud incorrectly suspended their production account via an automated action. The suspension took down Railway's control plane, API, dashboard, and GCP-hosted compute. Because Railway's edge proxies depend on a GCP-hosted control plane to populate routing tables, once cached routes expired, workloads on Railway Metal and AWS also became unreachable, cascading the outage beyond GCP. Recovery was sequential — persistent disks, compute instances, and networking each required separate restoration — extending the incident by several hours. GitHub rate-limiting of OAuth and webhook integrations added further complications during recovery. Railway is now working to eliminate the single-point-of-dependency on GCP for route discovery, extend HA database shards across AWS and Metal, and remove GCP from the data plane's hot path to prevent a similar cascade in the future.

8m read timeFrom blog.railway.com
Post cover image
Table of contents
ImpactIncident Timeline

Sort: