MiniCPM-V 2.6: A GPT-4V Level Multimodal LLMs for Single Image, Multi-Image, and Video on Your Phone
MiniCPM-V 2.6 is a cutting-edge multimodal LLM built on SigLip-400M and Qwen2-7B frameworks with 8 billion parameters. It excels in single image, multi-image, and video understanding, achieving top scores in benchmarks like OpenCompass, Mantis-Eval, and Video-MME. The model offers strong OCR capabilities, efficient token density, and is optimized for real-time video understanding on devices with limited resources. It supports various formats and setups, making it versatile and user-friendly for a wide range of visual processing tasks.
Sort: