OpenAI explains how they protect against URL-based data exfiltration when AI agents automatically fetch web content. The core defense uses an independent web index to verify URLs exist publicly before allowing automatic retrieval. If a URL hasn't been seen publicly, the system either blocks it or requires explicit user
Table of contents
The problem: a URL can carry more than a destinationWhy simple “trusted site lists” aren’t enoughOur approach: allow automatic fetching only for URLs that are already publicWhat you might see as a userWhat this protects against and what it doesn’tLooking aheadSort: