NVIDIA researchers extended the AlphaFold Protein Structure Database (AFDB) with large-scale homomeric and heteromeric protein complex predictions using a high-throughput pipeline. The pipeline decouples MSA generation (via MMseqs2-GPU) from structure inference (via ColabFold and OpenFold accelerated with TensorRT and cuEquivariance), and orchestrates workloads across NVIDIA H100 DGX Superpod clusters using SLURM. Key optimizations include staggered colabfold_search processes to reduce GPU idle time, sequence packing to minimize JAX recompilations, and evidence-driven interaction selection via STRING. Accuracy validation on 125 PDB homodimers shows OpenFold with TensorRT/cuEquivariance matches ColabFold baseline quality (75.41% usable vs 72.95%, mean DockQ 0.647 vs 0.637). The resulting high-confidence complex structures are being made available through the AlphaFold Database.
Sort: