90ms to Total Recall

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

A developer built a RAG pipeline that automatically injects personal knowledge into every Claude Code prompt in under 100ms. The system unifies 395 Obsidian notes and ~2,800 AI session memories into a single pgvector store, using Ollama's nomic-embed-text for embeddings and R2R as the RAG backend, all running on a homelab Kubernetes cluster. A Python hook fires on every prompt via Claude Code's UserPromptSubmit hook, searches the vector index (~90ms), and injects relevant chunks as system context before the agent responds. Key implementation details include: using a plain StatefulSet instead of CloudNativePG operator for simplicity, disabling R2R's default document summary generation, a 0.45 cosine similarity threshold, and automatic triggering with :rag/:norag escape hatches. The entire retrieval path is local with zero token costs.

#kubernetes

#rag

#ollama

#claude-code

#pgvector

Mar 05•10m read time•From itnext.io

Table of contents

How Vector Search Works Ingestion Search One store, two sources The Database Journey Get Piotr ’s stories in your inbox Ingesting the Obsidian Vault R2R Configuration The RAG Hook Result Links

Comment

Bookmark

Copy

Sort: