NVFP4 is NVIDIA's 4-bit floating-point quantization format that enables efficient deployment of large language models with near-baseline accuracy. Red Hat has released NVFP4-quantized versions of popular models ranging from 8B to 400B+ parameters, achieving 99% accuracy recovery on large models (70B-235B), 97-99% on mid-size

Sort: