Netflix's editorial teams face a needle-in-a-haystack problem when searching thousands of hours of raw footage. This post details the architecture behind their multimodal video search system, which fuses outputs from multiple specialized AI models (character recognition, scene detection, dialogue transcription) into a unified,

11m read timeFrom netflixtechblog.com
Post cover image

Sort: