Language models trained purely on text have a solid understanding of the visual world. They can generate complex scenes and refine their images. Researchers from MIT have tested the visual knowledge of these models and trained a computer vision system without using any visual data directly. The models demonstrate creativity in drawing concepts differently each time and their visual knowledge can be combined with other AI tools for improved results.

4m read timeFrom news.mit.edu
Post cover image

Sort: