A comprehensive system design walkthrough for building a personalized video search system on AWS using machine learning. The architecture combines a text-based search engine (ElasticSearch/OpenSearch with inverted indexes) and a visual search engine using dual-encoder models: BERT for text queries and ViT for video frame embeddings trained with contrastive learning. Covers the full ML lifecycle including data collection strategies (user interactions, LLM-generated negatives, human annotation), feature engineering, SageMaker Feature Store design, offline and online inference pipelines, serverless endpoint deployment with traffic-splitting rollout strategies, model monitoring, ground truth label collection, and the end-to-end serving architecture with a fusion and ranking layer.
Sort: