Cloudflare is evolving AI Gateway into a unified inference layer that lets developers access 70+ models from 12+ providers through a single API and one set of credits. Key updates include: using the same AI.run() Workers binding to call third-party models (OpenAI, Anthropic, Google, etc.) with a one-line switch, centralized cost monitoring with custom metadata breakdowns, automatic failover routing when a provider goes down, and streaming response buffering for resilient long-running agents. Cloudflare is also enabling developers to bring their own fine-tuned models to Workers AI via Replicate's Cog containerization technology. The platform now includes multimodal models (image, video, speech) and large agent-optimized models like Kimi K2.5, all served from Cloudflare's 330-city global network to minimize latency.
Table of contents
One catalog, one unified endpointBring your own modelThe fast path to first tokenBuilt for reliability with automatic failoverReplicateGet startedWatch on Cloudflare TVSort: