This post explains vision language models, their main building blocks, how to find the right model, and how to use them with transformers. It also mentions some open-source vision language models and benchmarks for evaluation.
•7m read time• From huggingface.co
Table of contents
What is a Vision Language Model?Overview of Open-source Vision Language ModelsFinding the right Vision Language ModelTechnical DetailsUsing Vision Language Models with transformersFine-tuning Vision Language Models with TRLSort: