Best of RAG — March 2026

1
Article
databricks·11w
Decoupled by Design: Billion-Scale Vector Search
Databricks redesigned its vector search infrastructure to handle billion-scale datasets by decoupling storage from compute. The new Storage Optimized endpoints use IVF (Inverted File Index) instead of HNSW, distributed K-means and Product Quantization built on PySpark with JAX, and a Rust-based dual-runtime query engine separating async I/O from CPU-bound computation. Key results: billion-vector indexes built in under 8 hours (20x faster), up to 7x lower serving costs, and 90%+ recall at 1 billion vectors. Query latency is ~300–500ms versus 20–50ms for the memory-resident Standard endpoints — a deliberate trade-off favoring scale and cost over ultra-low latency. The architecture relies on three interdependent bets: storage-compute separation, distributed indexing with a compatible index format, and aggressive compression via Product Quantization.
26
2
Article
ASP.NET Blog·9w
Generative AI for Beginners .NET: Version 2 on .NET 10
Version 2 of the free open-source course 'Generative AI for Beginners .NET' has been released, completely rebuilt on .NET 10. The curriculum is restructured into five focused lessons covering generative AI fundamentals, practical techniques (chat completions, prompt engineering, RAG, function calling), AI application patterns, multi-agent systems using the Microsoft Agent Framework, and responsible AI. The primary AI abstraction has shifted from Semantic Kernel to Microsoft.Extensions.AI (MEAI), which aligns with .NET 10 patterns like dependency injection. RAG samples have been rewritten using native SDKs, 11 legacy Semantic Kernel samples moved to deprecated, and all eight language translations updated.
22
3
Article
Towards Data Science·10w
Vibe Coding with AI: Best Practices for Human-AI Collaboration in Software Development
Explores best practices for human-AI collaboration in software development using vibe coding tools. Key risks identified include garbage-in-garbage-out prompting, poor prompt quality burning through model limits, and AI tendency to over-engineer solutions. Using a RAG system over news articles as a practical example, the author demonstrates a workflow: define clear requirements with test queries, generate architecture before code, validate and stress-test the design with edge cases, have the AI self-critique, and push back on unnecessary complexity. The central principle is a human-in-the-loop cycle where AI accelerates but humans remain the final arbiter on trade-offs, maintainability, and production readiness.
24
3
4
Article
SwirlAI·9w
State of Context Engineering in 2026
Context engineering has matured into a core AI engineering discipline. Five key patterns now define how production agents manage their context windows: (1) Progressive Disclosure via Agent Skills loads instructions in tiers based on relevance rather than upfront; (2) Context Compression uses sliding window plus LLM summarization to shrink accumulated history; (3) Context Routing classifies queries to direct them to the right knowledge source before anything enters the context window; (4) Retrieval Evolution moves from fixed RAG pipelines to agent-controlled loops with Agentic RAG, Graph RAG, and Self-RAG; (5) Tool and Capability Management addresses the hidden token cost of MCP tool schemas (90 tools can consume 50K+ tokens). Each pattern addresses a different failure mode, and production systems layer all five together. Practical starting points are given for each scenario.
22

See all RAG archives