A study of enterprise AI agents finds a tradeoff between task completion and privacy protection with stark implications for devs.

LeadDev

A new study using the CI-Work benchmark of 125 simulated enterprise tasks finds that frontier LLM agents leak sensitive corporate data at rates between 16% and 51%. The core problem is that LLMs cannot distinguish task-relevant information from contextually inappropriate data when retrieving from sources like Slack, emails, and meeting transcripts. Counterintuitively, more capable models and more thorough retrieval instructions worsen privacy violations. Researchers conclude that models cannot self-police: engineering teams must implement least-privilege access controls, context-aware filtering, and audit logs before data reaches the model's prompt window. The safest enterprise agent is not the most capable model but the best-constrained system.

Frontier AI models haemorrhage sensitive data