A research collaboration between UPenn and UCB explores using LLM-powered agents to tackle the classic database problem of join order optimization. Rather than placing an LLM in the hot path of a query optimizer, the prototype agent acts as an offline experimenter: given 50 iterations, it tests different join orderings using structured outputs and learns from observed runtimes. On the Join Order benchmark (JOB) with a scaled-up IMDb dataset, a frontier model achieved a 1.288x geomean latency improvement over the default optimizer, with P90 latency dropping by 41%. The agent outperformed perfect cardinality estimates and BayesQO. Key insight: LLMs excel at the iterative, exploratory tuning process that human experts perform manually, especially for queries with difficult predicates like LIKEs that confound traditional cardinality estimators.

8m read timeFrom databricks.com
Post cover image
Table of contents
IntroductionThe Problem: Join OrderingAgentic join orderingJoin UsNotes

Sort: