NVLM 1.0, a family of advanced multimodal large language models from NVIDIA, sets new benchmarks in vision-language and text-only tasks, challenging leading models like GPT-4o and Llama 3. The open-sourced NVLM-1.0-D-72B model offers improved text performance after multimodal training, with detailed benchmarking results and integration guidelines provided.
Table of contents
Model DetailsOther ResourcesBenchmark ResultsHow to useCorrespondence toCitationLicenseSort: