Inference-time scaling improves LLM answer quality by allocating more compute during text generation rather than training. The article categorizes different approaches including chain-of-thought prompting, self-consistency, best-of-N ranking, rejection sampling, self-refinement, and search over solution paths. Major LLM

4m read time From sebastianraschka.com
Post cover image

Sort: