Reworkd's Tarsier provides vision utilities for web interaction agents, solving problems like feeding webpages to an LLM, mapping LLM responses to web elements, and informing text-only LLMs about the page's visual structure.
Table of contents
How does it work?InstallationUsageLocal DevelopmentSupported OCR ServicesRoadmapCitations1 Comment
Sort: