Best of RAG — August 2025

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·40w
8 RAG Architectures for AI Engineers
Eight different RAG (Retrieval-Augmented Generation) architectures are explained with their specific use cases: Simple Vector RAG for basic semantic matching, Multi-modal RAG for cross-modal retrieval, HyDE for handling dissimilar queries, Self-RAG for validation against trusted sources, Graph RAG for structured relationships, Hybrid RAG combining vector and graph approaches, Adaptive RAG for dynamic query handling, and Agentic RAG for complex workflows with AI agents.
172
2
2
Article
Javarevisited·40w
Top 5 Vector Databases to Learn in 2025 (with Courses and Books to Master Them)
Vector databases have become essential infrastructure for AI applications in 2025, powering semantic search, RAG systems, and recommendation engines. The top 5 vector databases to learn are Pinecone (production-ready managed service), Weaviate (open-source with hybrid search), ChromaDB (lightweight local option), FAISS (industry-standard similarity search library), and Qdrant (high-performance Rust-based solution). Each database has specific strengths and learning resources including Udemy courses, Coursera programs, and technical books to help developers master these technologies for building modern GenAI applications.
60
3
Article
Meilisearch·40w
9 advanced RAG techniques to know & how to implement them
Advanced RAG techniques optimize retrieval-augmented generation systems beyond basic implementations. Nine key techniques include text chunking (semantic vs fixed-size), reranking with cross-encoders, metadata filtering, hybrid search combining keyword and vector methods, query rewriting for better intent understanding, autocut for dynamic text trimming, context distillation for focused summaries, and fine-tuning both LLMs and embedding models. These methods address common issues like noisy results, irrelevant context, and poor ranking. Implementation tools include Meilisearch for hybrid search, LangChain for workflow orchestration, Weaviate for vector search, and Pinecone for scalable vector databases. Evaluation focuses on retrieval accuracy, latency, precision-recall balance, and user satisfaction metrics.
55
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·42w
Make RAG systems 32x Memory Efficient!
Binary quantization can make RAG systems 32x more memory efficient by converting float32 embeddings to binary vectors. The technique involves ingesting documents, generating binary embeddings, storing them in a vector database like Milvus, and using Hamming distance for retrieval. A complete implementation demonstrates querying 36M+ vectors in under 30ms using LlamaIndex, Milvus, and Groq for inference, with deployment via Beam Cloud.
54
1
5
Article
DigitalOcean Community·39w
Context Engineering: Moving Beyond Prompting in AI
Context engineering is an advanced approach to working with large language models that goes beyond simple prompt crafting. It involves strategically managing the entire context window with curated information including task descriptions, examples, retrieved documents, conversation history, and external data. Unlike prompt engineering which focuses on clever single-line instructions, context engineering manages knowledge flow, memory systems, and information retrieval to build production-grade AI applications. The approach addresses context window limitations through techniques like chunking, filtering, and dynamic knowledge injection, making it essential for enterprise AI systems and autonomous agents that require consistent, accurate outputs.
42
6
Article
Weaviate·41w
Elysia: Building an end-to-end agentic RAG app
Elysia is an open-source agentic RAG framework that goes beyond traditional text-only AI assistants by using decision tree architecture, dynamic data display formats, and intelligent data analysis. Built with Python and powered by Weaviate, it features transparent decision-making processes, chunk-on-demand document processing, personalized feedback learning, and multi-model routing. The framework can be used as both a web application and Python library, offering customizable tools and real-time observability of AI reasoning processes.
37
7
Article
ByteByteGo·38w
EP178: The Lifecycle of a Kubernetes Pod
Covers the complete lifecycle of Kubernetes pods from creation to termination, including API server submission, scheduling, kubelet preparation, container states, and cleanup. Also explores CI/CD pipeline automation, open-source RAG stack components, software versioning strategies (SemVer, CalVer, Sequential, API), and the testing pyramid structure with unit, integration, and end-to-end tests.
31
8
Article
Meilisearch·40w
10 Best RAG Tools and Platforms: Full Comparison [2025]
A comprehensive comparison of 10 RAG tools and platforms for 2025, including Meilisearch, LangChain, RAGatouille, Verba, Haystack, Embedchain, LlamaIndex, MongoDB, Pinecone, and Vespa. Each tool is analyzed with key features, pricing, integrations, pros/cons based on user reviews, and ideal use cases. The guide covers open-source options, enterprise solutions, and search engine tools, providing selection criteria including retrieval methods, performance, scalability, integration ease, deployment options, cost considerations, and community support.
30
9
Article
Daily Dose of Data Science | Avi Chawla | Substack·39w
Corrective RAG Agentic Workflow
Corrective RAG (CRAG) enhances traditional RAG systems by adding a self-assessment step that evaluates retrieved document relevance before generating responses. The workflow searches documents, uses an LLM to assess context relevance, retains only relevant information, performs web search when needed, and aggregates context for final response generation. The implementation uses a tech stack including Firecrawl for web search, Milvus for vector storage, Beam for deployment, and LlamaIndex workflows for orchestration, with observability through CometML's Opik.
26
10
Article
Towards Data Science·41w
LangGraph + SciPy: Building an AI That Reads Documentation and Makes Decisions
A comprehensive tutorial on building an AI agent that helps users choose appropriate statistical tests by combining LangGraph for multi-step decision making with RAG (Retrieval-Augmented Generation) using SciPy documentation. The agent classifies user questions, searches embedded documentation when needed, provides recommendations, and generates sample Python code. The implementation includes ChromaDB for vector storage, OpenAI GPT-4 for language processing, and a Streamlit frontend for user interaction.
19
1
11
Article
Daily Dose of Data Science | Avi Chawla | Substack·42w
Build a Multimodal Agentic RAG
A comprehensive guide to building a multimodal agentic RAG system that processes both documents and audio files using speech input. The tutorial covers the complete workflow from data ingestion and audio transcription with AssemblyAI, to embedding storage in Milvus vector database, and orchestration with CrewAI Flows. The system allows users to query information using voice commands, with agents retrieving relevant context and generating cited responses. The implementation includes deployment using Beam for serverless containers and a Streamlit interface for user interaction.
16
12
Article
Towards Data Science·42w
Context Engineering — A Comprehensive Hands-On Tutorial with DSPy
Context Engineering is a systematic approach to building production-ready LLM applications by breaking complex problems into modular subproblems handled by specialized agents. The tutorial demonstrates using DSPy framework to implement structured outputs, multi-step workflows, tool calling, and RAG systems. Key concepts include sequential processing, iterative refinement, conditional branching, and advanced techniques like query rewriting, HYDE, and multi-hop search. Production considerations cover evaluation design, monitoring, structured outputs, and failure handling with tools like MLflow and Langfuse for observability.
12
13
Article
SwirlAI·38w
Breaking Down Context Engineering
Context Engineering is the practice of providing minimal, focused context to AI agents at each step of their execution. The article breaks down six types of context that need management: system prompts, user prompts, retrieved context, short-term memory, long-term memory, tools, and structured output. Each type presents unique challenges like context poisoning, token limits, relevance filtering, and format reliability. The practice evolved from prompt engineering to address complex multi-turn interactions and tool usage in production AI systems.
11

See all RAG archives