"Shake" LLMs to make them better...?

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

New AI research called 'neural thickets' reveals that after pre-training, large language model weights don't settle at a single optimal solution but reside in a dense region surrounded by alternative specialist models. By slightly perturbing weights with Gaussian noise ('jiggling'), the model can be nudged toward task-specific specialists — improving math, coding, or writing capabilities. This effect only manifests in very large models, which have sufficient parameter space to contain these hidden specialists. The implication is that a pre-trained model is effectively a neighborhood of latent specialists, and random weight sampling with outcome-based evaluation can directly yield improved models.

1m watch time

Sort: