The AI Coding Contest Day 12 pitted 10 major language models against each other in a Word Gem Puzzle — a sliding-tile letter game on grids up to 30×30. Kimi K2.6, an open-weights model from Chinese startup Moonshot AI, won outright with 22 match points and a 7-1-0 record. Xiaomi's MiMo V2-Pro came second, while GPT-5.5, GLM 5.1, and Claude Opus 4.7 placed third through fifth. The contest revealed that models capable of active tile-sliding outperformed static word-scanners on larger boards. Notably, Muse scored −15,309 by claiming every short word despite heavy scoring penalties. The author argues this result reflects a narrowing capability gap between open-weights Chinese models and Western frontier labs, with Kimi K2.6 scoring 54 on the Artificial Analysis Intelligence Index versus GPT-5.5's 60 and Claude's 57.

7m read timeFrom thinkpol.ca
Post cover image
Table of contents
The challengeWhat I sawWhat surprised meThe bigger picture

Sort: