SQL query generation from natural language

A Microsoft ISE team evaluated multiple AI agent approaches for converting natural language questions to SQL queries on poorly documented, messy databases. They tested GitHub Copilot CLI (with Claude Sonnet 4.5 and Gemini 3.0), Microsoft Agent Framework (GPT-5 Mini), and Azure Databricks AI/BI Genie, achieving up to ~75-80% accuracy. Key findings: runtime query execution is essential (removing it dropped accuracy to 38%), schema metadata and domain hints significantly boost performance, and model choice matters (Claude Sonnet 4.5 outperformed GPT-5 Mini by ~11 points). The primary remaining failure mode is business logic errors—semantic misunderstandings that require domain expertise rather than technical fixes. Practical takeaways include starting with schema documentation and runtime validation, designing evaluation criteria early, and budgeting for iterative domain expert review.

#llm

#azure

#backend

#ai-agents

May 07•13m read time•From devblogs.microsoft.com

Table of contents

Introduction Copy link Research Foundation Copy link Dataset Copy link Approach and Solution Copy link Evaluation Methodology Copy link Experiments Copy link Findings Copy link Conclusion Copy link

Comment

Bookmark

Copy

Sort: