90ms to Total Recall
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A developer built a RAG pipeline that automatically injects personal knowledge into every Claude Code prompt in under 100ms. The system unifies 395 Obsidian notes and ~2,800 AI session memories into a single pgvector store, using Ollama's nomic-embed-text for embeddings and R2R as the RAG backend, all running on a homelab Kubernetes cluster. A Python hook fires on every prompt via Claude Code's UserPromptSubmit hook, searches the vector index (~90ms), and injects relevant chunks as system context before the agent responds. Key implementation details include: using a plain StatefulSet instead of CloudNativePG operator for simplicity, disabling R2R's default document summary generation, a 0.45 cosine similarity threshold, and automatic triggering with :rag/:norag escape hatches. The entire retrieval path is local with zero token costs.
Table of contents
How Vector Search WorksIngestionSearchOne store, two sourcesThe Database JourneyGet Piotr ’s stories in your inboxIngesting the Obsidian VaultR2R ConfigurationThe RAG HookResultLinksSort: