Large language models trained mainly on text were prompted to improve the illustrations they coded for. In self-supervised visual representation learning experiments, these pictures trained a computer vision system to make semantic assessments of natural images.

MIT is a renowned institution for education and research, offering insights into science, engineering, and technology. Through publications, research papers, and academic programs, MIT's platform provides insights into  research, innovation, and education in various fields. Students, researchers, and technology enthusiasts can learn about MIT's contributions to science and technology and explore opportunities for academic and professional development.

MIT News

Language models trained purely on text have a solid understanding of the visual world. They can generate complex scenes and refine their images. Researchers from MIT have tested the visual knowledge of these models and trained a computer vision system without using any visual data directly. The models demonstrate creativity in drawing concepts differently each time and their visual knowledge can be combined with other AI tools for improved results.

Understanding the visual knowledge of language models