Qwen Team released Qwen-Image, an open-source image foundation model that excels at text-to-image generation and image editing tasks. The model combines Qwen2.5-VL for text processing, a VAE for images, and a Multimodal Diffusion Transformer for generation. It outperforms other models on various benchmarks and ranks third on AI Arena against closed models like GPT Image 1. The model was trained on billions of annotated image-text pairs using progressive scaling strategies and reinforcement learning from human feedback.
Sort: