Reproducing GPT-2 (124M) in llm.c allows training the model with a single GPU, although it may take longer.

16m read timeFrom github.com
Post cover image

Sort: