It’s an adversarial question for LLMs, but it’s not unfair.

Minimaxir's blog is a hub for machine learning enthusiasts, offering tutorials, project showcases, and insights into the latest trends in AI and data science. With a focus on practical applications of machine learning, Minimaxir shares tips, tools, and resources for building and deploying ML models. Developers can learn about deep learning frameworks, natural language processing techniques, and AI-powered creativity, gaining  skills to tackle real-world problems.

Max Woolf's Blog

A comprehensive test of modern LLMs reveals surprising inconsistencies in their ability to count letters in words. While most models correctly identify 3 r's in "strawberry" and 2 b's in "blueberry," GPT-5 Chat fails the blueberry test 73% of the time, often confidently claiming there are 3 b's. The study tested multiple models including GPT-5 variants, Claude, and Gemini across 274 trials, showing that tokenization limitations don't fully explain these failures since some models perform perfectly while others struggle with the same task.

Can modern LLMs actually count the number of b's in "blueberry"?