A GSoC 2025 project that built an end-to-end semantic video search engine capable of finding specific moments within videos using natural language queries. The system uses a two-part architecture: an ingestion pipeline that processes videos with AI models (TransNetV2, WhisperX, BLIP, VideoMAE) to extract shots, transcripts,
Table of contents
The Problem: Beyond KeywordsThe Big Picture: A Two-Act PlayPart 1: The Ingestion Pipeline - Teaching the Machine to Watch TVPart 2: The Search Application - Reaping the RewardsThe Final Result & GSoC Experience1 Comment
Sort: