Many of us are familiar with the retrieval augmented generative AI (RAG) pattern for building agentic AI applications – like digital concierges, frontline support chatbots and agents that can help with basic self-service troubleshooting.  At a high level, the flow for RAG is fairly clear – the user’s prompt is augmented with some relevant […]

The Ubuntu Blog provides updates, tutorials, and insights on the Ubuntu operating system and related projects. Covering topics such as Linux desktops, server administration, and cloud computing, the blog offers resources for developers and sysadmins working with Ubuntu. Developers can learn how to set up, configure, and optimize Ubuntu systems for development, deployment, and production environments by following the Ubuntu Blog.

Ubuntu

Production-grade RAG systems require more than basic vector search. Hybrid search combines dense vector search (semantic similarity) with sparse full-text search (BM25) to improve recall, merging results using algorithms like Reciprocal Rank Fusion (RRF) or weighted average scoring. A reranking model (cross-encoder) then takes the top 50–100 hybrid search candidates and re-scores them by evaluating the query and document together, surfacing the most relevant 5–10 chunks for the LLM. The post includes a PostgreSQL SQL example implementing hybrid search with RRF using pgvector and tsvector, and explains where each stage (embeddings, hybrid search, reranking) belongs in the architecture stack.

Hybrid search and reranking: a deeper look at RAG