A hands-on demonstration of knowledge base poisoning in RAG systems, where three fabricated documents injected into a ChromaDB collection caused an LLM to report completely false financial data with 95% success rate. The attack exploits both retrieval (cosine similarity) and generation (authority framing) conditions without any jailbreak or software exploit. Five defense layers were tested independently: ingestion sanitization had no effect, while embedding anomaly detection at ingestion reduced attack success from 95% to 20% — far outperforming prompt hardening, access controls, or output monitoring. Combining all five layers brought residual success to 10%. Practical recommendations include mapping all write paths into the knowledge base, implementing embedding anomaly detection at ingestion (~50 lines of Python), and using ML-based output monitoring rather than regex. Full lab code is available on GitHub.

13m read timeFrom aminrj.com
Post cover image
Table of contents
The Setup: 100% Local, No Cloud RequiredThe Theory: PoisonedRAG’s Two ConditionsBuilding the Attack: Three Documents, One ObjectiveRunning ItWhat Makes This Dangerous in ProductionThe Defense That Surprised MeThe 10% That Gets ThroughImplications for Your Production RAGRead More in This Series

Sort: