Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post.
Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research questions. This post aims to review recent developments in how to effectively use test-time compute (i.e. “thinking time”) and why it helps.

Lilian Weng is a machine learning researcher and writer who shares insights, research findings, and tutorials on machine learning, artificial intelligence, and data science. Through articles, blog posts, and research summaries, Lilian Weng explores topics such as deep learning, natural language processing, and reinforcement learning. Readers can learn about state-of-the-art algorithms, practical applications of machine learning, and trends shaping the field of AI.

Lil’Log

Recent developments in AI have focused on the use of test-time compute and chain-of-thought (CoT) to improve model performance by emulating human-like thinking processes, which involve both fast and slow thought modes. Techniques like parallel sampling and sequential revision are being explored for enhancing the decoding process, with reinforcement learning showing promise in training models capable of advanced reasoning. The integration of external tools such as code interpreters further boosts problem-solving capabilities. Interpretability and the accuracy of these methods are crucial, as biases can arise if models do not faithfully represent their reasoning processes.

Why We Think