Best of RAG — July 2025

1
Article
DigitalOcean Community·46w
LangChain Explained: The Ultimate Framework for Building LLM Applications
LangChain is an open-source Python framework that simplifies building LLM applications by providing standard interfaces for chat models, embeddings, and vector stores. It offers key components like chains for sequential operations, agents for autonomous decision-making, memory for conversation context, tools for external integrations, and vector stores for retrieval-augmented generation. The framework abstracts away complexity when connecting LLMs to external data sources and APIs, making it easier to build chatbots, question-answering systems, and other AI applications without reinventing common functionality.
142
2
Article
Javarevisited·44w
Top 5 Books to Learn LLMs (Large Language Models) in Depth
A curated list of five essential books for learning Large Language Models in depth, covering everything from basic engineering concepts to production deployment. The recommendations include practical guides for building LLM applications, training models from scratch, and deploying them at scale. Each book targets different aspects of LLM development, from foundational architecture and prompt engineering to production monitoring and evaluation strategies.
109
3
3
Article
freeCodeCamp·44w
How AI Agents Remember Things: The Role of Vector Stores in LLM Memory
Large language models don't have inherent memory, but vector stores enable AI agents to simulate memory by converting text into numerical embeddings and storing them in specialized databases. When users interact with AI, the system searches for semantically similar stored vectors to retrieve relevant past information. Popular vector databases include FAISS for local deployments and Pinecone for cloud-based solutions. This approach, called retrieval-augmented generation (RAG), allows AI to appear contextually aware despite technical limitations around similarity-based matching and static embeddings.
98
4
Article
SingleStore·45w
How to Build a RAG Knowledge Base in Python for Customer Support
A comprehensive guide to building a Retrieval-Augmented Generation (RAG) system for customer support using Python, LangChain, OpenAI, and SingleStore. The tutorial covers setting up a vector database, converting documents into embeddings, implementing semantic search, and generating contextual answers. Real-world case studies show 28.6% reduction in issue resolution time. The step-by-step implementation includes environment setup, database configuration, embedding creation, and API endpoint development for instant, accurate support responses.
81
5
Article
Machine Learning Mastery·47w
5 Advanced RAG Architectures Beyond Traditional Methods
Five advanced RAG architectures that go beyond traditional retrieval-generation pipelines: Dual-Encoder Multi-Hop Retrieval breaks down complex queries into layered searches; Context-Aware Feedback Loops enable iterative self-improvement through confidence evaluation; Modular Memory-Augmented RAG maintains persistent, contextual memory across sessions; Agentic RAG integrates tool usage for active reasoning and real-time data processing; and Graph-Structured Context Retrieval uses knowledge graphs to find interconnected information rather than simple similarity matches.
78
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·43w
What is Context Engineering?
Context engineering is emerging as a critical skill for AI engineers, focusing on systematically orchestrating context rather than just clever prompting. Unlike traditional prompt engineering that relies on 'magic words', context engineering creates dynamic systems that provide the right information, tools, and format to LLMs. The approach addresses the real bottleneck in AI applications - not model capability, but setting up proper information architecture. Key components include dynamic information flow, smart tool access, memory management (both short-term and long-term), and format optimization. As AI models improve, context quality becomes the limiting factor for application success.
62
7
Article
Daily Dose of Data Science | Avi Chawla | Substack·44w
Prompting vs. RAG vs. Finetuning
A decision framework for choosing between prompt engineering, RAG, and fine-tuning when building LLM applications. The choice depends on two key factors: the amount of external knowledge required and the level of model adaptation needed. RAG works best for custom knowledge bases without behavior changes, fine-tuning modifies model structure and behavior, prompt engineering suffices for basic adjustments, and hybrid approaches combine RAG with fine-tuning for complex requirements.
44
8
Article
AI Developer·44w
How to build RAG in 2 minutes
A developer shares their experience building an AI-powered chatbot to streamline documentation searches using CustomGPT.ai. The platform offers no-code setup, natural language understanding, and seamless integration with existing documentation. Key benefits include quick setup, industry customization, automatic data ingestion from websites, and highly accurate responses based on specific data rather than generic AI outputs.
41
9
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
6 No-code LLM, Agents, and RAG Builder Tools for AI Engineers
Six open-source no-code tools enable AI engineers to build LLM applications, agents, and RAG systems without extensive programming. Featured tools include RAGFlow for document understanding, Langflow for visual agent building, LLaMA-Factory for model fine-tuning, Transformer Lab for local LLM experimentation, xpander for agent backends, and AutoAgent for natural language agent creation. These platforms collectively have over 200k GitHub stars and support various AI development workflows from training to deployment.
40
10
Article
The New Stack·46w
Context Engineering: Going Beyond Prompt Engineering and RAG
Context engineering is a comprehensive approach to LLM development that goes beyond simple prompt crafting. It involves designing dynamic systems that manage everything an LLM sees before generating responses - including system instructions, conversation history, retrieved documents, tool outputs, and guardrails. Unlike prompt engineering which focuses on crafting individual queries, context engineering treats the entire context window as a curated information environment. It encompasses RAG as one component while addressing broader challenges like token budget management, information positioning, and maintaining consistency across varied inputs. This systematic approach transforms LLMs from basic chatbots into autonomous agents capable of complex reasoning and decision-making.
25
11
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
Will Long-Context LLMs Make RAG Obsolete?
Long-context LLMs with extended context windows (up to 1M+ tokens) are challenging the necessity of RAG systems. Academic research shows mixed results: while long-context models excel at multi-hop reasoning and document summarization, RAG remains superior for cost efficiency, domain-specific tasks, and large-scale retrieval. Long-context processing can cost up to $20 per request for 200K-1M tokens, making RAG more economical. A hybrid approach combining both technologies shows promise, with cache-augmented generation (CAG) emerging as an alternative that preloads knowledge into extended context windows for faster, more accurate responses.
17
12
Article
Daily Dose of Data Science | Avi Chawla | Substack·43w
Build the Ultimate MCP Server for Multimodal AI
A comprehensive guide to building an MCP (Model Context Protocol) server that enables multimodal AI capabilities across text, images, audio, and video. The tutorial demonstrates using Pixeltable as the multimodal AI infrastructure and CrewAI for orchestrating agent workflows. The system includes specialized agents for different modalities, a router agent for query classification, and a synthesis agent for response generation. The implementation supports RAG (Retrieval-Augmented Generation) operations across all media types through Docker-deployed MCP servers.
14
1
13
Article
Javarevisited·45w
Think Like a Pro: Implementing RAG with Ollama and Spring AI
A comprehensive guide to building AI-powered support assistants using Spring AI's RAG framework with Ollama and Llama 3.2. Covers the complete implementation including vector store configuration, file upload handling, document processing, and chat functionality. Demonstrates how to keep sensitive data on-premises while enabling semantic search capabilities through embeddings and vector databases.
14
14
Article
Medium·45w
Chat with your documents tool — RAG (vector DBs + cosine sim.) & Claude API implementation
A detailed implementation of a RAG system for a law firm that processes 1TB of legal documents using vector embeddings, FAISS indexing, and Claude API. The system chunks documents, creates embeddings with a trilingual MiniLM model, performs cosine similarity search, and includes citation verification to prevent hallucinations. Key features include OCR processing, privacy-focused local deployment, sub-20ms query response times, and costs around $0.02 per query.
10

See all RAG archives