A developer experimented with using GPT-4o's structured outputs for web scraping, creating an AI-assisted web scraper. While the model performed well with simple and complex tables, it struggled with combined rows and generating XPaths. Cost is a concern due to the model's character volume requirements. Future improvements could include better UX through capturing browser events and further refining HTML data cleanup.

•6m read time•From blancas.io
Post cover image
Table of contents
Asking GPT-4o to scrape dataParsing complex tablesCombined rows break the modelAsking GPT-4o to return XPathsCombining the two approachesGPT-4o is very expensiveConclusions and demo
15 Comments

Sort: