Thanks to Michael Fairchild and Microsoft, we now have some accessibility benchmarks for a selected range of AI models – where they were tested with automatic accessibility tools after generating sample code. Project is called A11y LLM Eval (opens in new window), and it provides quite some insights, both directly and indirectly. When I checked … Continue reading

Bogdan on A11y

Microsoft's A11y LLM Eval project benchmarks AI models on accessibility by testing generated code with automatic tools. Results show color contrast failures remain most common, likely because AI learns from existing poor code rather than calculating ratios. The benchmark reveals automatic testing has limited capability and only checks static code patterns. Model Context Protocol (MCP) tools can help AI generate better accessible code by checking contrast ratios before delivery. However, passing automatic tests doesn't guarantee actual accessibility or usability—human knowledge remains essential throughout design, development, content creation, and testing stages.

AI will soon deliver code that will pass automatic testing by default – Bogdan on Digital Accessibility (A11y)

Color contrast failures are (still) most common

Helping AI to deliver better accessibility, again

A reminder to conclude – human knowledge is essential