Google’s data center and global networks can distribute AI workloads across campuses, to create a massive-scale, pooled hypercomputing resource.

Google Cloud Platform provides a suite of cloud computing services for building, deploying, and managing applications and infrastructure on Google's global network. Developers can learn about cloud-native development, machine learning, and big data analytics to leverage GCP's scalable and reliable cloud infrastructure for their projects.

Google Cloud

Google details how it has evolved its network infrastructure to meet the demands of large-scale AI workloads. The post covers three pillars: the fabric inside the AI Hypercomputer (including the new Virgo Network scale-out fabric linking up to 134,000 TPU 8t chips with 47 petabits/sec bandwidth), the fabric across campuses over WAN (with 10x traffic growth since 2020 and a new AI-native Cloud Interconnect scaling to petabit capacity), and the global network for AI inference (spanning 10M+ km of fiber, 43 cloud regions, 200+ edge locations). Key innovations include a decoupled three-domain campus architecture, autonomous fault detection and hang detection for training jobs, sub-millisecond telemetry for microburst detection, and multi-shard global network isolation for beyond-nines reliability.

Data center and global networks built for AI era