Laravel's team ran a benchmark called Boost Benchmarks, testing six AI models (Claude haiku 4.5, sonnet 4.6, opus 4.6, Kimi k2.5, GPT-5.3 Codex, GPT-5.4) against 17 real Laravel tasks with and without Laravel Boost (an MCP server providing AI coding context). Results show GPT-5.3 Codex and GPT-5.4 tied at 16/17 evaluations
Sort: