Cloudflare has launched 'Markdown for Agents', allowing AI crawlers to request Markdown versions of web pages via the Accept: text/markdown HTTP header, reducing token usage significantly compared to HTML. Alongside this, Cloudflare proposes 'Content Signals', a mechanism for publishers to declare in robots.txt whether their content can be used for AI training, search indexing, or inference. The signals are preferences only, not enforceable. Google's John Mueller criticized the approach, arguing LLMs can already parse HTML and that flattening to Markdown removes context. Publishers remain divided on AI scraping, with outlets like Medium blocking AI crawlers by default. Cloudflare is also experimenting with a pay-per-crawl model returning HTTP 402 responses.

3m read timeFrom infoq.com
Post cover image

Sort: