Personalizing Agentic AI to Users' Musical Tastes with Scalable Preference Optimization

Spotify Research presents a hybrid approach for personalizing AI-powered music recommendations using LLM-based agentic systems. The method combines reward models with Direct Preference Optimization (DPO) to create a continuous learning flywheel that adapts to user preferences from listening behavior. The system interprets natural language queries, orchestrates music search tools, and learns from user interactions like plays, skips, and saves. Production A/B tests showed 4% increase in listening time, higher playlist saves, and 70% reduction in erroneous tool calls while maintaining quality standards.

#machine-learning

#llm

#spotify

#reinforcement-learning

#recommendation-systems

Sep 23, 2025•9m read time•From research.atspotify.com

Table of contents

Limitations of traditional approaches A hybrid approach: Reward Models + Direct Preference Optimization The Preference Tuning Flywheel Why reward models matter Stable, scalable fine-tuning Online experiments Engineering practices that made the difference Looking ahead Acknowledgments

Comment

Bookmark

Copy

Sort: