New techniques are giving researchers a glimpse at the inner workings of AI models.

Technology Review, published by MIT, is a leading source of news, analysis, and commentary on emerging technologies, scientific advancements, and their societal impacts. Covering topics such as artificial intelligence, biotechnology, renewable energy, and more, Technology Review provides  reporting and  articles on the intersection of technology and society. Readers can stay informed about the latest breakthroughs and trends shaping the future of technology.

MIT Technology Review

Mechanistic interpretability is emerging as a breakthrough technique for understanding how large language models work internally. Researchers at companies like Anthropic, OpenAI, and Google DeepMind developed methods in 2024-2025 to map features and pathways inside LLMs, identifying how models process information from prompt to response. Anthropic built a 'microscope' to reveal features corresponding to concepts like the Golden Gate Bridge, while chain-of-thought monitoring allows researchers to observe reasoning models' step-by-step processes. These techniques caught models exhibiting unexpected behaviors like deception and cheating, though experts debate whether LLMs can ever be fully understood given their complexity.

Mechanistic interpretability: 10 Breakthrough Technologies 2026