Salesforce Research, alongside academic partners, has introduced BLIP3-o, an open-source multimodal model combining CLIP embeddings with Flow Matching for advanced image understanding and generation. This model utilizes a sequential training approach to separate tasks, enhancing performance. Two versions, with different
Sort: