Best of RAG — October 2025

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·34w
A 100% Open-source Alternative to n8n!
Sim is an open-source drag-and-drop platform for building agentic workflows that runs locally with any LLM. The article demonstrates building a finance assistant connected to Telegram using agents, MCP servers, and APIs. It also covers four RAG indexing strategies: chunk indexing (splitting documents into embedded chunks), sub-chunk indexing (breaking chunks into finer pieces while retrieving larger context), query indexing (generating hypothetical questions for better semantic matching), and summary indexing (using LLM-generated summaries for dense data).
156
7
2
Article
Hacker News·34w
The RAG Obituary: Killed by Agents, Buried by Context Windows
RAG (Retrieval-Augmented Generation) architectures are becoming obsolete as LLM context windows expand dramatically from 4K to 2M+ tokens. The author argues that agentic search systems using simple tools like grep and filesystem navigation outperform complex RAG pipelines involving chunking, embeddings, hybrid search, and reranking. Drawing from experience building financial research platforms, they demonstrate how agents can navigate complete documents and follow cross-references naturally, eliminating the infrastructure burden and accuracy problems inherent in fragment-based retrieval. The shift from context scarcity to abundance fundamentally changes how AI systems should process information.
55
3
3
Article
Closer to Code·29w
Announcing llm-docs-builder: An Open Source Tool for Making Documentation AI-Friendly
llm-docs-builder is an open-source tool that transforms Markdown documentation into AI-optimized formats, reducing token usage by 85-95% compared to HTML versions. It strips noise like CSS, JavaScript, and HTML boilerplate while preserving semantic structure and context hierarchy. The tool generates llms.txt indexes for AI discoverability and can be configured to automatically serve optimized markdown to AI crawlers while maintaining HTML for human visitors. Real-world metrics from Karafka framework show 20-36x file size reductions, translating to lower RAG costs and fewer hallucinations.
37
1
4
Article
Meilisearch·32w
Introducing Meilisearch Chat
Meilisearch launches a new /chat endpoint that transforms existing search indexes into conversational AI experiences. The feature provides built-in RAG capabilities without requiring separate vector databases or complex infrastructure. It offers an OpenAI-compatible API that handles the entire retrieval and generation workflow automatically, enabling developers to add natural language question-answering to applications in days rather than months. Use cases include e-commerce product discovery, customer support, internal knowledge bases, and developer documentation.
16
2
5
Article
SingleStore·32w
Context Engineering: A Definitive Guide
Context engineering is a systematic approach to building AI systems that goes beyond prompt engineering by designing the complete environment in which AI operates. It involves structuring data sources, integrating tools, maintaining memory across interactions, and ensuring AI agents have access to relevant information when needed. The article explains how context engineering differs from prompt engineering and RAG systems, introduces the Model Context Protocol (MCP) as a standardized interface for managing data sources, and demonstrates building context-aware workflows using SingleStore as a long-term memory layer with vector search capabilities.
15
6
Article
Meilisearch·33w
RAG vs. CAG: The Smarter Choice for Your AI Stack
Retrieval-Augmented Generation (RAG) fetches live data from external sources for accurate, up-to-date responses but is slower and more expensive. Cache-Augmented Generation (CAG) uses pre-stored information for faster, cost-effective answers but risks serving outdated content. RAG suits scenarios requiring real-time accuracy like financial updates or unpredictable queries, while CAG excels at repetitive tasks like FAQ chatbots. Hybrid approaches combine both, using cached responses for speed while falling back to live retrieval when needed. The choice depends on query patterns, budget, performance requirements, and data freshness needs.
15
1
7
Article
Faun·33w
Deploying a Complete RAG Ecosystem with a Single Command: My Ultimate Docker Stack
A comprehensive Docker Compose stack that deploys a complete RAG (Retrieval-Augmented Generation) infrastructure with a single command. The setup includes Ollama for local LLM execution, Qdrant for vector search, MongoDB for document storage, Redis for caching, Neo4j for knowledge graphs, Keycloak for authentication, and n8n for workflow automation. The stack can be configured for CPU-only, GPU-accelerated, or external API usage, with automated setup scripts that handle dependencies and provide instant access to all services. Neo4j integration enables advanced relationship mapping between documents and entities, enriching context beyond traditional vector search.
15
8
Article
Daily Dose of Data Science | Avi Chawla | Substack·30w
Another MCP Moment by Anthropic?
Anthropic released Claude Skills, a feature designed to solve agent memory persistence by acting as standard operating procedures for AI agents. The announcement includes comparisons to Model Context Protocol (MCP), projects, and subagents, with practical examples of building custom skills. The piece also promotes a comprehensive MCP crash course series covering fundamentals, architecture, integration with frameworks like LangGraph and LlamaIndex, and real-world implementations.
12
9
Article
Hacker News·30w
Context engineering is sleeping on the humble hyperlink
Context engineering for LLMs faces a key challenge: providing all necessary context without overwhelming the model. While techniques like RAG and subagents help, hyperlinks offer an underutilized solution. By implementing a simple read_resources tool that accepts URIs, agents can dynamically load relevant context on-demand, similar to how humans navigate documentation. This approach is token-efficient, flexible, and enables just-in-time context loading. The Model Context Protocol (MCP) Resources provides the infrastructure needed, though most clients don't yet expose resources to models directly. The Firebase MCP Server demonstrates this pattern in practice with linked workflows for project initialization.
12

See all RAG archives