Spotify developed a method to adapt large language models for personalized content recommendations by introducing Semantic IDs—compact tokens that encode relationships between catalog items and user behaviors. The approach involves building catalog-native representations from textual and behavioral signals, aligning these with an open-weight LLM's vocabulary, and fine-tuning on personalization tasks. The domain-adapted 1B-parameter model achieved up to 1.96× improvement over baselines in episode recommendations, with multi-task training providing an additional 22% boost. The system enables explainable recommendations while maintaining real-time performance through efficient serving infrastructure using vLLM and Redis-backed key-value stores.

15m read timeFrom research.atspotify.com
Post cover image
Table of contents
A Spotify catalog-native vocabulary for LLMsDomain Specific Training: Learning personalization tasksEvaluating LLMs that speak SpotifyScalingServingConclusionAcknowledgments

Sort: