A blog owner explains why legitimate users with older browsers are being blocked: anti-crawler measures implemented to combat high-volume scrapers using old Chrome user agents for LLM training data collection. The post details the challenge of distinguishing legitimate traffic from malicious crawlers, notes specific issues with Vivaldi browser's user agent masking and archive.today's crawling behavior that mimics bad actors, and provides contact information for false positives.

2m read timeFrom utcc.utoronto.ca
Post cover image
Table of contents
A special note for people using VivaldiA special note for people using archive.*

Sort: