Managing the context window in Large Language Models (LLMs) is essential to optimize performance and cost. Strategies to truncate chat history include retaining the system message, sending only the last few messages, limiting token count, and summarizing older messages. Implementing an IChatHistoryReducer interface with

6m read timeFrom devblogs.microsoft.com
Post cover image
Table of contents
Key Considerations for Truncating Chat History Copy linkExample Scenario Copy linkStrategies for Truncating Chat History Copy linkDefining a Chat History Reducer Abstraction Copy linkTruncating Based on Message Count Copy linkTruncating Based on Maximum Token Count Copy linkSummarizing Older Messages Copy link

Sort: