Voyage-multimodal-3 is a new state-of-the-art model for multimodal embeddings, capable of vectorizing interleaved text and images and capturing key visual features from various sources like PDFs, slides, and tables. It outperforms leading models like OpenAI CLIP and Cohere multimodal v3 in retrieval tasks, eliminating the need for complex document parsing. This model processes both text and visuals within the same transformer encoder, providing robust performance for mixed-modality searches.

8m read timeFrom blog.voyageai.com
Post cover image
Table of contents
Support for Interleaved Text & ImagesMixed Modality Search with ScreenshotsEvaluation DetailsResultsTry voyage-multimodal-3 now!Share this:Like this:

Sort: