In Dec 2022, a website was created to track price changes in Greece's largest supermarkets using Playwright for web scraping. The main challenges included handling JavaScript-based sites, automating the scraping process, and avoiding IP restrictions. After initial attempts with an old laptop failed, a decision was made to use Hetzner for its cost-efficiency. The setup integrated Tailscale to tackle IP restrictions and used a CI server to manage daily scraping tasks. Optimizations focused on improving scrape speed and cost-efficiency, like upgrading server specs and reducing data fetched.

10m read timeFrom sakisv.net
Post cover image
Table of contents
Table of contentsScraping js sitesAutomatingAvoiding IP restrictionsHow and when does it fail?OptimisingCostConclusion
2 Comments

Sort: