This blog is part of a research collaboration with UPenn and UCB that explores if frontier models could be applied to one of the oldest database challenges: join ordering. To learn more, see also followup UCB blog on "How do LLM agents think through SQL join orders?".

databricks

A research collaboration between UPenn and UCB explores using LLM-powered agents to tackle the classic database problem of join order optimization. Rather than placing an LLM in the hot path of a query optimizer, the prototype agent acts as an offline experimenter: given 50 iterations, it tests different join orderings using structured outputs and learns from observed runtimes. On the Join Order benchmark (JOB) with a scaled-up IMDb dataset, a frontier model achieved a 1.288x geomean latency improvement over the default optimizer, with P90 latency dropping by 41%. The agent outperformed perfect cardinality estimates and BayesQO. Key insight: LLMs excel at the iterative, exploratory tuning process that human experts perform manually, especially for queries with difficult predicates like LIKEs that confound traditional cardinality estimators.

Are LLM agents good at join order optimization?