OpenAI's gpt-oss models use MXFP4, a 4-bit floating point data type that reduces inference costs by 75% compared to traditional BF16 models. MXFP4 uses micro-scaling blocks to maintain precision while dramatically cutting memory and compute requirements. This allows a 120 billion parameter model to run on 80GB VRAM instead of requiring much more memory. The format enables 4x faster token generation and significantly lower hardware costs for running large language models, though it may sacrifice some quality compared to higher precision formats.
Sort: