ScreenAI is a visual language model for UI and visually-situated language understanding. It improves upon previous models with a flexible patching strategy, achieving state-of-the-art results on various tasks. The model is trained using a unique mixture of datasets and tasks, and performs well on benchmarks compared to models of similar size. The study concludes that further research is needed to bridge the gap with larger models.
2 Comments
Sort: