-
Notifications
You must be signed in to change notification settings - Fork 849
Add middleware for reducing chat history #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds middleware infrastructure for reducing chat history by introducing a ReducingChatClient and IChatReducer interface. This allows for chat message reduction strategies to be applied before messages are sent to the underlying chat client.
- Introduces
ReducingChatClientmiddleware for chat history reduction - Adds
IChatReducerinterface for implementing reduction strategies - Provides builder extensions for easy integration into chat pipelines
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClient.cs |
Main middleware implementation that applies reduction before delegating to inner client |
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/IChatReducer.cs |
Interface defining the contract for chat message reduction strategies |
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClientBuilderExtensions.cs |
Extension methods for integrating the reducer into chat client pipelines |
test/Libraries/Microsoft.Extensions.AI.Tests/ChatCompletion/ReducingChatClientTests.cs |
Comprehensive unit tests for the new middleware functionality |
test/Libraries/Microsoft.Extensions.AI.Integration.Tests/ReducingChatClientTests.cs |
Removes duplicate implementation that was moved to production code |
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClient.cs
Show resolved
Hide resolved
|
Thanks! Can you please sync offline with @crickman and @westey-m? I want to make sure we're factoring in experiences with implementing similar concepts in SK and as part of various agentic libraries. I also want to make sure we've thought through the various scenarios where this will be relevant. With a tokenizer-based reducer, or something that can be implemented entirely client side, this approach seems reasonable. I'm wondering about other needs, like submitting the conversation history or parts of it to an LLM for it to summarize and then replace that part of the conversation... that's not going to fit well into an IChatClient like this, as you'd need to do it as part of every request, rather than persisting the changes and maintaining a constantly summarized history or the like. |
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClientBuilderExtensions.cs
Show resolved
Hide resolved
|
Looking at how Semantic Kernel handles chat history reduction, I think the API in this PR should support implementing summarization reducers that can persist summaries between requests by embedding them in the chat history itself. This appears to be similar to Semantic Kernel's We should also consider exposing the built-in reducers as standalone components (not just middleware) so applications can implement their own heuristics for when to reduce history, rather than doing it on every request. I think the next step is to implement some concrete reducers to validate the API design. |
Note that there's a subtle but important difference here from SK. With IChatClient, the input messages list is an IEnumerable that's effectively immutable. With SK's IChatCompletionService, the input messages list is an IList that's mutable. We made a very conscious choice to switch to the immutable model, such that all new messages are returned as part of a ChatResponse, rather than adding some of the messages to the chat history and then only returning the last one. But it does make cases like this one harder. |
I was thinking that the summary would be appended to the chat list (e.g., via |
Interesting. Worth pursuing. Devil will probably be in the details for how these things are represented in a way that makes them distinct. And will probably need some helpers that would enable a developer to clean up the resulting chat history. |
|
The original thinking in the SK reducer was that the reducer would provide an updated enumeration that was entirely consistent without mutating the original history. Then the developer could determine if they desired replace or mutate the original chat-history. Some utility extensions were provided to make opt-in mutation or replacement easy.
One advantage is that this avoided a large re-allocation during reduction (until the developer expressed their intent with the enumeration).
|
|
I just added a prototype implementation of a summarizing reducer that works by embedding summaries in the
The implementation I've put in this PR is a demonstration of this approach. Happy to consider alternatives. If we do pursue this implementation strategy, there's more I'll need to follow-up on:
|
|
These are now done:
Follow-ups to consider:
|
On the topic of agentic libraries: I looked into this, and the proposed abstractions and chat summarizer implementation in this PR seem like a natural fit. If multiple agents are participating in the same conversation, it may make more sense to use an That said, if the goal is for each agent to handle its own reduction (or opt out entirely) the middleware approach should still work, since the message list isn't destructively mutated. The one caveat is with If that scenario is a concern, we could consider making the |
|
/backport to release/9.8 |
|
Started backporting to release/9.8: https://github.com/dotnet/extensions/actions/runs/16915783371 |
Summary
Adds chat client middleware to reduce chat history.
Fixes #6647
Description
The design is based primarily on the existing
ReducingChatClientdefined in our integration tests.It might also be worth implementing some chat reducers that we expect to be commonly used (e.g., one that truncates chat history, or one that uses a chat client to summarize chat history into a smaller number of messages). Even if we don't ship these, it would at least validate the design.
Microsoft Reviewers: Open in CodeFlow