Grok-1.5V is a first-generation multimodal model with the ability to process various visual information. It outperforms its peers in real-world spatial understanding. The RealWorldQA benchmark evaluates spatial understanding capabilities of multimodal models.
Sort: