NVLM 1.0, a family of advanced multimodal large language models from NVIDIA, sets new benchmarks in vision-language and text-only tasks, challenging leading models like GPT-4o and Llama 3. The open-sourced NVLM-1.0-D-72B model offers improved text performance after multimodal training, with detailed benchmarking results and integration guidelines provided.

6m read timeFrom huggingface.co
Post cover image
Table of contents
Model DetailsOther ResourcesBenchmark ResultsHow to useCorrespondence toCitationLicense

Sort: