Netflix built an internal post-training framework to scale LLM fine-tuning from experimentation to production. The framework abstracts infrastructure complexity across four dimensions: data (streaming, sequence packing, loss masking), model (sharding, LoRA, architecture support), compute (distributed job orchestration, checkpointing, MFU monitoring), and workflow (supporting both SFT and on-policy RL). Key engineering decisions include staying Hugging Face-compatible for interoperability, maintaining optimized internal model implementations for performance, and evolving from SPMD-only execution to hybrid orchestration for RL workflows. The system enables researchers to focus on modeling rather than distributed systems plumbing.

13m read timeFrom netflixtechblog.com
Post cover image
Table of contents
IntroductionA Model Developer’s Post-Training JourneyThe Netflix Post-Training FrameworkLearnings from Building the Post-Training FrameworkGet Netflix Technology Blog ’s stories in your inboxWrap upAcknowledgements

Sort: