Salesforce Research, alongside academic partners, has introduced BLIP3-o, an open-source multimodal model combining CLIP embeddings with Flow Matching for advanced image understanding and generation. This model utilizes a sequential training approach to separate tasks, enhancing performance. Two versions, with different parameter sizes and data sources, demonstrate superior results in benchmarks against existing models, showcasing efficient multimodal capabilities.

4m read timeFrom marktechpost.com
Post cover image

Sort: