Pseudo-relevance feedback is an information retrieval technique that uses the top N initial search results as a 'foreground' corpus, compares them against the full background corpus, and extracts anomalous or frequent terms to expand the original query. Two main approaches are covered: embedding-based expansion (Bag of Documents) and term-frequency-based expansion (Semantic Knowledge Graph). A practical example using a 'dining room table' query illustrates both the promise (finding related terms like tablecloth, placemat) and pitfalls (spurious terms from noisy fields). The key takeaway is that data/corpus quality is critical — clean fields yield better candidate expansions, and there is no shortcut around good content hygiene.

2m read timeFrom softwaredoug.com
Post cover image

Sort: