OpenAI’s GPT-OSS 20B & 120B are here. See specs, features, and how to run these open-source LLMs locally or self-host in the cloud.

Semaphore's platform is a resource for software development teams, offering insights into continuous integration, continuous delivery, and DevOps practices. Through articles, tutorials, and case studies, Semaphore offers insights into CI/CD pipelines, automation workflows, and release management strategies. Readers can learn about best practices in software delivery, infrastructure as code, and deployment automation to streamline their development processes and accelerate time to market. Additionally, Semaphore provides updates on the latest tools, technologies, and industry trends to help teams stay ahead of the curve and build better software, faster.

Semaphore

OpenAI released GPT-OSS-20B and GPT-OSS-120B, their first open-source large language models under Apache 2.0 license. The 20B model runs on consumer hardware with 16-32GB RAM, while the 120B model requires data center GPUs. Both feature chain-of-thought reasoning, configurable effort levels, fine-tuning support, and 128k token context windows. Users can test them via OpenAI's playground, HuggingFace, or API providers, and run them locally using LM Studio, Ollama, or vLLM. Cloud hosting costs range from $4,000-$12,000 monthly for dedicated GPU instances, with shared options available for lower costs.

GPT-OSS: Specs, Setup, and Self-Hosting Guide