Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning Author: Keertana Chidambaram, Qiuling Xu, Ko-Jen Hsiao, Moumita Bhattacharya (*The work was done when Keertana …

The Netflix Tech Blog provides insights into the engineering practices, architecture, and innovations behind the Netflix streaming platform. Developers can learn about Netflix's microservices architecture, data infrastructure, and machine learning algorithms, as well as explore topics such as video encoding, content delivery, and personalization algorithms used in the Netflix service.

Netflix TechBlog

Netflix researchers introduce Advantage-Weighted Supervised Fine-tuning (A-SFT), a novel post-training algorithm for generative recommender systems. Unlike traditional RLHF methods used in LLMs, A-SFT addresses unique challenges in recommendation systems: lack of counterfactual data, noisy reward models, and unknown logging policies. The algorithm combines supervised fine-tuning with advantage reweighting to leverage directional signals from uncertain reward models without relying on inverse propensity scoring. Benchmarked against PPO, DPO, IPO, and CQL, A-SFT demonstrates superior performance in offline evaluation metrics (NDCG, HR, MRR) while avoiding overfitting to noisy reward signals.

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning

Challenges in Post-training for Recommendation

Get Netflix Technology Blog’s stories in your inbox

Advantage Weighted Supervised Fine Tuning