Netflix's engineering team describes their LLM-as-a-Judge system for automatically evaluating show synopsis quality at scale. The system scores synopses across four quality dimensions (tone, clarity, precision, factuality) using a combination of techniques: per-criteria dedicated judges, tiered rationales (extended reasoning

10m read timeFrom netflixtechblog.com
Post cover image
Table of contents
IntroductionThe Making of a “Good” SynopsisScaling Quality Scoring with LLM-as-a-JudgeGet Netflix Technology Blog ’s stories in your inboxMember Validation of LLM-as-a-JudgeClosing Remarks

Sort: