NanoGPT Slowrun

NanoGPT Slowrun is an open benchmarking effort by Q Labs focused on data-efficient learning algorithms for language models. Unlike speedrun benchmarks that optimize wall-clock time, Slowrun trains on a fixed 100M token dataset (FineWeb) with unlimited compute, rewarding algorithms that achieve the lowest validation loss. Community contributions have already pushed data efficiency from 2.4x to 5.5x versus modded-nanogpt in just days. Key findings include Muon optimizer outperforming AdamW/SOAP/MAGMA, the importance of multi-epoch training with aggressive regularization (weight decay up to 16x standard plus dropout), shuffling at epoch start, learned value embedding projections, SwiGLU activations, and model ensembling. Open research directions include second-order optimizers, diffusion models, curriculum learning, and gradient descent alternatives. The project aims for 10x data efficiency short-term and potentially 100x by year-end.

#machine-learning

#llm

Mar 04•3m read time•From qlabs.sh

Table of contents

What we've found so far Update: 5.5x Data Efficiency Directions we think are wide open

Comment

Bookmark

Copy

Sort: