This post introduces LLaVA-Gemma, a compact vision-language model leveraging the Gemma Large Language Model in two variants, Gemma-2B and Gemma-7B. It explores the trade-offs between computational efficiency and multimodal understanding in small-scale models.

5m read time From marktechpost.com
Post cover image

Sort: