In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the…

NVIDIA DevTalk serves as a vibrant community hub where developers can engage in discussions, seek assistance, and collaborate on projects involving NVIDIA hardware and software. Developers can tap into the collective expertise of the NVIDIA developer community, sharing insights, troubleshooting issues, and exploring best practices for GPU programming and AI development. Additionally, DevTalk provides a platform for developers to showcase their projects, receive feedback, and network with peers, fostering collaboration and knowledge exchange within the NVIDIA ecosystem.

NVIDIA Developer

FP8 training requires sophisticated scaling strategies to maintain numerical stability and accuracy. Per-tensor scaling assigns unique scaling factors to each tensor, with delayed scaling using historical data and current scaling adapting in real-time. Per-block scaling divides tensors into smaller segments with individual scaling factors, addressing variability within tensors. MXFP8, NVIDIA's hardware-optimized solution for Blackwell architecture, implements 32-element blocks with power-of-2 scaling factors. Experiments with Nemotron models show MXFP8 achieves comparable accuracy to BF16 while offering significant efficiency gains.

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training