Scaling multimodal AI with CuMo, a method that integrates sparse MoE blocks into multimodal language models (LLMs) for efficient scaling while maintaining performance. The approach employs co-upcycling and a three-stage training process, resulting in impressive performance on visual question-answering benchmarks and multimodal reasoning challenges.
Sort: