The post discusses several multimodal Large Language Models (LLMs) and their capabilities, including KosMos-2, Shikra, GPT-4V, and Gemini.

5m read time From ai.plainenglish.io
Post cover image
Table of contents
Kosmos-2ShikraGPT4VGemini

Sort: