Dynamic Context Parallelism (Dynamic-CP) is a scheduling approach in NVIDIA Megatron Core that addresses computational inefficiencies in training large language models and diffusion transformers with variable-length sequences. Unlike static context parallelism that fixes CP size to the longest sequence in a batch, Dynamic-CP adaptively selects CP size per microbatch based on sequence packing strategies. The system uses a solver that models compute and communication costs to optimize packing and CP sizing while respecting GPU memory constraints. Framework modifications include building multiple CP groups per rank, dynamic rescheduling with THD layout, and asynchronous solver execution to avoid training overhead. Benchmarks show 1.48x speedup on GitHub datasets and over 35% improvement in industrial multi-GPU environments by reducing data-parallel imbalances and unnecessary communication overhead.
Table of contents
Megatron Core framework modifications for supporting Dynamic-CPData scheduler modelingCollaboration of cost model, solver, and simulatorModeling process and bi-objective balanceZero-overhead executionBenchmark resultsLearn moreSort: