Can agents replace the search stack?
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
An empirical investigation into whether AI agents can replace traditional search stacks (query understanding + reranking). Using Amazon ESCI benchmarks, a basic BM25/embedding backend paired with GPT-4o-mini and GPT-5 agents achieves NDCG jumps from 0.289 to 0.453 with minimal engineering effort. Key findings: agents mostly call search tools once per query, but keyword search nudges more exploratory multi-query behavior; forcing more diverse tool calls with smaller models approaches GPT-5 performance; specialized agentic search models like SID-1 show promise as drop-in RAG replacements. However, for information retrieval tasks (MSMarco), agents offer no improvement over well-trained embedding models — the retriever knows best when the LLM lacks the underlying knowledge.
Table of contents
Surprisingly good results with simple tool usageBut, agents call search approximately… once?Encouraging exploration improves furtherWhat about agentic search models (ie SID-1)?“Finding” different than Deep Research**Sort: