A comprehensive test of modern LLMs reveals surprising inconsistencies in their ability to count letters in words. While most models correctly identify 3 r's in "strawberry" and 2 b's in "blueberry," GPT-5 Chat fails the blueberry test 73% of the time, often confidently claiming there are 3 b's. The study tested multiple models
Sort: