Google Cloud has launched U.S. and EU multi-region endpoints for Claude on Vertex AI in public preview. These endpoints pool capacity across multiple regions within a single geography, automatically routing requests to improve reliability while keeping data within compliance boundaries. They serve as a middle ground between single-region endpoints (low latency, strict location) and global endpoints (maximum capacity, no geographic constraint). Multi-region endpoints support prompt caching with intelligent routing to cached regions, use separate quota pools, and require only a simple API URL change to adopt — replacing a specific region like us-central1 with us or eu.
Table of contents
What are multi-region endpoints and when should you use them?Comparing your endpoint optionsFull support for prompt cachingBest practicesHow to get startedSort: