A hands-on exploration of which local LLM sampling parameters actually make a noticeable difference in output quality. Testing with Qwen 3.5 9b in LM Studio, the author finds that temperature is the most impactful setting (0.7 recommended for general use), presence penalty (0.7–1.0) helps prevent repetitive outputs better than repeat penalty, and Min-P (around 0.1) is the best companion to high temperature — outperforming Top-K and Top-P for dynamic token filtering. Repeat penalty is best left at 1.0 to avoid instability on smaller models.

6m read timeFrom xda-developers.com
Post cover image
Table of contents
A quick caveatTemperature is the most important settingRepeat and Presence penaltiesMin-P is the key to working with high temperature

Sort: