An empirical investigation into whether stricter JSON schemas for MCP tool parameters improve agent reliability. Using a FastMCP expense-logging server with multiple schema variants (bare string, Annotated descriptions, Literal/Enum types, regex patterns), the author ran evaluations across 17 test cases, multiple OpenAI models (gpt-4o, gpt-4.1-mini, gpt-5.3-codex), reasoning effort levels, and two agent frameworks (Pydantic AI and GitHub Copilot SDK). Key findings: combining enum constraints with descriptions (Annotated[Enum]) yielded the best category accuracy but at double the token cost; date schema strictness had zero measurable impact on frontier models; model choice and reasoning effort level mattered more than schema strictness; and both agent frameworks produced identical results. The conclusion is that modern frontier models are well-trained for tool calling and mostly need clarity for ambiguous fields rather than strict type constraints, though stricter types still benefit server-side code quality.
Table of contents
A basic MCP tool and schemaAnnotating parameters with descriptionsConstraining parameters with typesSetting up evaluationsEvaluation results: categoryEvaluation results: dateCross-model evaluationsImpact of reasoning effortComparing agent frameworksTakeawaysSort: