Document Poisoning in RAG Systems: How Attackers Corrupt Your AI's Sources

A hands-on demonstration of knowledge base poisoning in RAG systems, where three fabricated documents injected into a ChromaDB collection caused an LLM to report completely false financial data with 95% success rate. The attack exploits both retrieval (cosine similarity) and generation (authority framing) conditions without any jailbreak or software exploit. Five defense layers were tested independently: ingestion sanitization had no effect, while embedding anomaly detection at ingestion reduced attack success from 95% to 20% — far outperforming prompt hardening, access controls, or output monitoring. Combining all five layers brought residual success to 10%. Practical recommendations include mapping all write paths into the knowledge base, implementing embedding anomaly detection at ingestion (~50 lines of Python), and using ML-based output monitoring rather than regex. Full lab code is available on GitHub.

#rag

#vector-search

#ai-security

#prompt-injection

Mar 13•13m read time•From aminrj.com

Table of contents

The Setup: 100% Local, No Cloud Required The Theory: PoisonedRAG’s Two Conditions Building the Attack: Three Documents, One Objective Running It What Makes This Dangerous in Production The Defense That Surprised Me The 10% That Gets Through Implications for Your Production RAG Read More in This Series

Comment

Bookmark

Copy

Sort: