NVIDIA's BioNeMo team has developed a context parallelism (CP) framework called Fold-CP that enables biomolecular modeling of large protein complexes by sharding a single molecular system across multiple GPUs. Traditional approaches required fragmenting proteins into smaller pieces, losing long-range structural information. The CP framework uses 2D tiling of pair representations, overlapping computation with asynchronous communication, and halo-exchange-based distributed primitives to achieve linear memory scaling. Using 256 H100 GPUs, the system can handle up to ~20,000 tokens, and demonstrated folding a 3,605-residue complex across four chains in under 5 minutes on just four H100 GPUs. Partners like Rezo Therapeutics, Proxima, and Earendil Labs have already integrated the framework for drug discovery applications. The open-source Boltz CP code is available on GitHub.
Table of contents
Sharding a single large molecular system across multiple GPUsBioNeMo context parallelism implementationUnlocking token scaling for structural biologyGet started with context parallelism for biomolecular modelingSort: