Reworkd's Tarsier provides vision utilities for web interaction agents, solving problems like feeding webpages to an LLM, mapping LLM responses to web elements, and informing text-only LLMs about the page's visual structure.

•3m read time•From github.com
Post cover image
Table of contents
How does it work?InstallationUsageLocal DevelopmentSupported OCR ServicesRoadmapCitations
1 Comment

Sort: