Teaching Large Language Models to Speak Spotify: How Semantic IDs Enable Personalization

Spotify developed a method to adapt large language models for personalized content recommendations by introducing Semantic IDs—compact tokens that encode relationships between catalog items and user behaviors. The approach involves building catalog-native representations from textual and behavioral signals, aligning these with an open-weight LLM's vocabulary, and fine-tuning on personalization tasks. The domain-adapted 1B-parameter model achieved up to 1.96× improvement over baselines in episode recommendations, with multi-task training providing an additional 22% boost. The system enables explainable recommendations while maintaining real-time performance through efficient serving infrastructure using vLLM and Redis-backed key-value stores.

#machine-learning

#llm

#spotify

#recommendation-systems

Nov 25, 2025•15m read time•From research.atspotify.com

Table of contents

A Spotify catalog-native vocabulary for LLMs Domain Specific Training: Learning personalization tasks Evaluating LLMs that speak Spotify Scaling Serving Conclusion Acknowledgments

Comment

Bookmark

Copy

Sort: