Inference-time scaling improves LLM answer quality by allocating more compute during text generation rather than training. The article categorizes different approaches including chain-of-thought prompting, self-consistency, best-of-N ranking, rejection sampling, self-refinement, and search over solution paths. Major LLM
•4m read time• From sebastianraschka.com
Sort: