Netflix researchers introduce Advantage-Weighted Supervised Fine-tuning (A-SFT), a novel post-training algorithm for generative recommender systems. Unlike traditional RLHF methods used in LLMs, A-SFT addresses unique challenges in recommendation systems: lack of counterfactual data, noisy reward models, and unknown logging

12m read timeFrom netflixtechblog.com
Post cover image
Table of contents
IntroductionChallenges in Post-training for RecommendationGet Netflix Technology Blog’s stories in your inboxAdvantage Weighted Supervised Fine TuningBenchmarksOffline Evaluation Results

Sort: