Microsoft's A11y LLM Eval project benchmarks AI models on accessibility by testing generated code with automatic tools. Results show color contrast failures remain most common, likely because AI learns from existing poor code rather than calculating ratios. The benchmark reveals automatic testing has limited capability and only checks static code patterns. Model Context Protocol (MCP) tools can help AI generate better accessible code by checking contrast ratios before delivery. However, passing automatic tests doesn't guarantee actual accessibility or usability—human knowledge remains essential throughout design, development, content creation, and testing stages.

5m read timeFrom cerovac.com
Post cover image
Table of contents
Color contrast failures are (still) most commonHelping AI to deliver better accessibility, againA reminder to conclude – human knowledge is essentialAuthor: Bogdan Cerovac

Sort: