Meta's Perception Language Models (PLMs) are fully open-source vision language models built without relying on closed-source proprietary models for training data. The architecture combines Llama 3 as the base LLM with a Perception Encoder (PE) and a two-layer MLP projector to handle text, images, and video inputs. Training
•8m watch time
Sort: