Abstract page for arXiv paper 2604.15597: LLMs Corrupt Your Documents When You Delegate

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

A research paper introduces DELEGATE-52, a benchmark simulating long delegated workflows across 52 professional domains to evaluate LLM reliability in document editing tasks. Testing 19 LLMs reveals that even frontier models (Gemini, Claude, GPT) corrupt an average of 25% of document content during extended interactions. Key findings: agentic tool use doesn't improve performance, and degradation worsens with document size, interaction length, and presence of distractor files. The errors are sparse but severe, silently compounding over time — making current LLMs unreliable for delegated knowledge work.

[2604.15597] LLMs Corrupt Your Documents When You Delegate