A walkthrough of building a custom Spring AI advisor that automatically compacts chat memory when the context window fills up. Inspired by Claude Code's /compact command, the advisor monitors message count against a configurable threshold, then uses a secondary LLM (Google Gemini 2.5 Flash) to summarize older messages into a single condensed entry. This reduces token usage while preserving conversational context. The implementation covers advisor structure, bean configuration, multi-model setup (OpenAI as primary, Gemini as summarizer), and debug logging to observe compaction behavior.
•25m watch time
Sort: