Ollama introduces Turbo, a $20/month cloud service that runs large language models on datacenter-grade hardware for faster inference. The service allows users to run larger models that don't fit on consumer GPUs while maintaining privacy by not retaining user data. Turbo works with existing Ollama CLI, API, and JavaScript/Python libraries, currently offering gpt-oss-20b and gpt-oss-120b models in preview with usage limits and US-based infrastructure.

2m read timeFrom ollama.com
Post cover image
Table of contents
What is Turbo?Which models are available in Turbo?Does Turbo work with Ollama's CLI?Does Turbo work with Ollama's API and JavaScript/Python libraries?What data do you retain in Turbo mode?Where is the hardware that power Turbo located?What are the usage limits for Turbo?

Sort: