What if You Could Turn Your Vision-Only Model into a VLM by only Training a Linear Layer using a Modest Amount of Unlabeled Images? Meet Text-to-Concept (and Back) via Cross-Model Alignment

We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

University of Maryland and Meta AI propose a method to map text to concept vectors using off-the-shelf vision encoders trained without text supervision. This method adjusts a vision model’s representation space to coincide with a CLIP model's. The method learns a mapping between representation spaces to use this capacity for commercially available models.