Ever wondered what happens when your Spring AI chat memory fills up? In this tutorial, inspired by a great question at Dev2Next conference, I'll show you how to build a custom Compacting Chat Memory Advisor that automatically summarizes your conversation history - just like Claude Code's /compact command!

In this hands-on coding session, we'll solve a real problem in Spring AI applications: managing context windows efficiently. Instead of clearing all messages when you hit the limit, we'll build an intelligent advisor that compacts your conversation history while preserving important context. This is perfect for developers building production-ready AI applications who need better control over token usage and memory management.

🎯 What You'll Learn:
✅ How to create custom advisors in Spring AI (beyond the built-in ones)
✅ Building a smart chat memory system that auto-compacts at configurable thresholds
✅ Understanding context window management in LLM applications
✅ Implementing conversation summarization to optimize token usage
✅ Setting up debug logging to monitor your advisor's behavior

📚 Key Topics Covered:
• Spring AI advisors as AOP-like functions around LLM calls
• Difference between stateless LLMs and stateful chat applications
• Message Chat Memory Advisor configuration and limitations
• Creating configurable thresholds for automatic compaction
• Building production-ready AI features from conference conversations

🔧 Prerequisites:
• Basic knowledge of Spring Boot
• Understanding of Spring AI fundamentals
• Java and Maven setup

Ready to level up your Spring AI applications? Watch now to learn how to build custom advisors that solve real-world problems!

🔗Resources & Links mentioned in this video:
GitHub Repo: https://github.com/danvega/compacting-chat-memory-advisor

👋🏻Connect with me:
Website: https://www.danvega.dev
Twitter: https://twitter.com/therealdanvega
Github: https://github.com/danvega
LinkedIn: https://www.linkedin.com/in/danvega
Newsletter: https://www.danvega.dev/newsletter

SUBSCRIBE TO MY CHANNEL: http://bit.ly/2re4GH0 ❤️

Dan Vega

A walkthrough of building a custom Spring AI advisor that automatically compacts chat memory when the context window fills up. Inspired by Claude Code's /compact command, the advisor monitors message count against a configurable threshold, then uses a secondary LLM (Google Gemini 2.5 Flash) to summarize older messages into a single condensed entry. This reduces token usage while preserving conversational context. The implementation covers advisor structure, bean configuration, multi-model setup (OpenAI as primary, Gemini as summarizer), and debug logging to observe compaction behavior.

Build a Smart Chat Memory Advisor in Spring AI That Auto-Compacts Context