We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

OpenAI is a research organization focused on artificial intelligence and machine learning. Readers can learn about  AI research, deep learning models, and AI applications across various domains. With research papers, blog posts, and technical documentation, OpenAI provides  insights and expertise for understanding and advancing the field of artificial intelligence.

OpenAI

OpenAI introduces a framework and 13 evaluations to measure chain-of-thought monitorability in AI reasoning models. The research examines how monitoring internal reasoning chains compares to monitoring outputs alone, finding that frontier models remain fairly monitorable and that longer reasoning improves monitorability. A key finding reveals a tradeoff between model size and reasoning effort: smaller models with higher reasoning effort can match larger models' capabilities while being easier to monitor, though at increased inference cost. The study also explores how reinforcement learning and pretraining scale affect monitorability, and demonstrates that follow-up questions can further improve monitoring effectiveness.

Evaluating chain-of-thought monitorability