NVIDIA Megatron-Core, an updated version of Megatron-LM, is a PyTorch-based library designed for efficient large-scale training of transformer models. It offers GPU-optimized techniques, modular APIs, and support for multimodal training. Key features include activation recomputation, distributed checkpointing, expert

10m read timeFrom developer.nvidia.com
Post cover image
Table of contents
NVIDIA Megatron-CoreMultimodal training is now supported in Megatron-CoreTraining throughput optimization for mixture of expertsFast distributed checkpointing for better training resiliencyImproved scalabilityGet started

Sort: