Microsoft's A11y LLM Eval project benchmarks AI models on accessibility by testing generated code with automatic tools. Results show color contrast failures remain most common, likely because AI learns from existing poor code rather than calculating ratios. The benchmark reveals automatic testing has limited capability and only

5m read timeFrom cerovac.com
Post cover image
Table of contents
Color contrast failures are (still) most commonHelping AI to deliver better accessibility, againA reminder to conclude – human knowledge is essentialAuthor: Bogdan Cerovac

Sort: