Reproducing GPT-2 (124M) in llm.c allows training the model with a single GPU, although it may take longer.
Sort: