Production-grade RAG systems require more than basic vector search. Hybrid search combines dense vector search (semantic similarity) with sparse full-text search (BM25) to improve recall, merging results using algorithms like Reciprocal Rank Fusion (RRF) or weighted average scoring. A reranking model (cross-encoder) then takes the top 50–100 hybrid search candidates and re-scores them by evaluating the query and document together, surfacing the most relevant 5–10 chunks for the LLM. The post includes a PostgreSQL SQL example implementing hybrid search with RRF using pgvector and tsvector, and explains where each stage (embeddings, hybrid search, reranking) belongs in the architecture stack.

10m read timeFrom ubuntu.com
Post cover image
Table of contents
Embeddings and vector searchHybrid searchRerankingWrap-up

Sort: