Inference-time scaling improves LLM answer quality by allocating more compute during text generation rather than training. The article categorizes different approaches including chain-of-thought prompting, self-consistency, best-of-N ranking, rejection sampling, self-refinement, and search over solution paths. Major LLM providers use these techniques, which can boost model accuracy significantly without changing model weights. The piece draws from research for a book chapter that improved base model accuracy from 15% to 52%.

4m read timeFrom sebastianraschka.com
Post cover image

Sort: