Research comparing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for post-training foundation models reveals that SFT causes models to memorize training data and fail on out-of-distribution examples, while RL enables genuine generalization. Using Llama-3.2-Vision-11B on arithmetic and navigation tasks,
Sort: