Grok-1.5V is a first-generation multimodal model with the ability to process various visual information. It outperforms its peers in real-world spatial understanding. The RealWorldQA benchmark evaluates spatial understanding capabilities of multimodal models.

1m read timeFrom x.ai
Post cover image

Sort: