A comparison of two approaches to evaluating search quality: query-based evaluation (aggregating clicks per query into relevance labels) vs. session-based evaluation (replaying individual user sessions). Session-based eval offers better sampling accuracy by treating each user interaction equally, similar to probability-based polling, and preserves time-sensitive features like dynamic pricing for learning-to-rank training. However, it sacrifices per-query debuggability. The post recommends using both approaches for different purposes: session-based for simulated A/B testing and query-based for diagnosing specific query failures.

8m read timeFrom softwaredoug.com
Post cover image
Table of contents
But there’s a different way - session based evalFinal thoughts -porque no los dos?

Sort: