Salesforce Research, alongside academic partners, has introduced BLIP3-o, an open-source multimodal model combining CLIP embeddings with Flow Matching for advanced image understanding and generation. This model utilizes a sequential training approach to separate tasks, enhancing performance. Two versions, with different parameter sizes and data sources, demonstrate superior results in benchmarks against existing models, showcasing efficient multimodal capabilities.
Sort: