OpenAI's gpt-oss models use MXFP4, a 4-bit floating point data type that reduces inference costs by 75% compared to traditional BF16 models. MXFP4 uses micro-scaling blocks to maintain precision while dramatically cutting memory and compute requirements. This allows a 120 billion parameter model to run on 80GB VRAM instead of
Sort: