Voyage-multimodal-3 is a new state-of-the-art model for multimodal embeddings, capable of vectorizing interleaved text and images and capturing key visual features from various sources like PDFs, slides, and tables. It outperforms leading models like OpenAI CLIP and Cohere multimodal v3 in retrieval tasks, eliminating the need for complex document parsing. This model processes both text and visuals within the same transformer encoder, providing robust performance for mixed-modality searches.

Table of contents
Support for Interleaved Text & ImagesMixed Modality Search with ScreenshotsEvaluation DetailsResultsTry voyage-multimodal-3 now!Share this:Like this:Sort: