DeepSeek AI Releases DeepGEMM: An FP8 GEMM Library that Supports both Dense and MoE GEMMs Powering V3/R1 Training and Inference

We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

DeepSeek AI has introduced DeepGEMM, a library designed to enhance FP8 General Matrix Multiplication (GEMM) operations. The library supports both dense and Mix-of-Experts GEMMs, optimizing performance on NVIDIA Hopper tensor cores. DeepGEMM employs a Just-In-Time (JIT) compilation strategy to streamline integration and maximize hardware utilization. It achieves significant speedups in matrix multiplication tasks by addressing common issues of memory bandwidth and numerical precision, making it a valuable tool for improving deep learning pipelines.