Google DeepMind's PaliGemma 2 Mix introduces new vision-language models tailored for a range of tasks including OCR, image captioning, and more. These models are available in multiple sizes, integrated with the Transformers ecosystem for ease of use, and support various image resolutions. They demonstrate enhanced performance,
Sort: