Google Cloud has announced the preview of multi-cluster GKE Inference Gateway, designed to scale AI/ML inference workloads across multiple GKE clusters and Google Cloud regions. It addresses limitations of single-cluster deployments such as availability risks, GPU/TPU resource silos, scalability caps, and latency for global
Sort: