Skip to content

Conversation

@MackinnonBuck
Copy link
Member

@MackinnonBuck MackinnonBuck commented Jul 30, 2025

Summary

Adds chat client middleware to reduce chat history.

Fixes #6647

Description

The design is based primarily on the existing ReducingChatClient defined in our integration tests.

It might also be worth implementing some chat reducers that we expect to be commonly used (e.g., one that truncates chat history, or one that uses a chat client to summarize chat history into a smaller number of messages). Even if we don't ship these, it would at least validate the design.

Microsoft Reviewers: Open in CodeFlow

Copilot AI review requested due to automatic review settings July 30, 2025 18:36
@MackinnonBuck MackinnonBuck requested a review from a team as a code owner July 30, 2025 18:36
@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Jul 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds middleware infrastructure for reducing chat history by introducing a ReducingChatClient and IChatReducer interface. This allows for chat message reduction strategies to be applied before messages are sent to the underlying chat client.

  • Introduces ReducingChatClient middleware for chat history reduction
  • Adds IChatReducer interface for implementing reduction strategies
  • Provides builder extensions for easy integration into chat pipelines

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClient.cs Main middleware implementation that applies reduction before delegating to inner client
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/IChatReducer.cs Interface defining the contract for chat message reduction strategies
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ReducingChatClientBuilderExtensions.cs Extension methods for integrating the reducer into chat client pipelines
test/Libraries/Microsoft.Extensions.AI.Tests/ChatCompletion/ReducingChatClientTests.cs Comprehensive unit tests for the new middleware functionality
test/Libraries/Microsoft.Extensions.AI.Integration.Tests/ReducingChatClientTests.cs Removes duplicate implementation that was moved to production code

@stephentoub
Copy link
Member

Thanks!

Can you please sync offline with @crickman and @westey-m? I want to make sure we're factoring in experiences with implementing similar concepts in SK and as part of various agentic libraries.

I also want to make sure we've thought through the various scenarios where this will be relevant. With a tokenizer-based reducer, or something that can be implemented entirely client side, this approach seems reasonable. I'm wondering about other needs, like submitting the conversation history or parts of it to an LLM for it to summarize and then replace that part of the conversation... that's not going to fit well into an IChatClient like this, as you'd need to do it as part of every request, rather than persisting the changes and maintaining a constantly summarized history or the like.

@MackinnonBuck
Copy link
Member Author

Looking at how Semantic Kernel handles chat history reduction, I think the API in this PR should support implementing summarization reducers that can persist summaries between requests by embedding them in the chat history itself. This appears to be similar to Semantic Kernel's ChatHistorySummarizationReducer.

We should also consider exposing the built-in reducers as standalone components (not just middleware) so applications can implement their own heuristics for when to reduce history, rather than doing it on every request.

I think the next step is to implement some concrete reducers to validate the API design.

@stephentoub
Copy link
Member

Looking at how Semantic Kernel handles chat history reduction, I think the API in this PR should support implementing summarization reducers that can persist summaries between requests by embedding them in the chat history itself. This appears to be similar to Semantic Kernel's ChatHistorySummarizationReducer.

Note that there's a subtle but important difference here from SK. With IChatClient, the input messages list is an IEnumerable that's effectively immutable. With SK's IChatCompletionService, the input messages list is an IList that's mutable. We made a very conscious choice to switch to the immutable model, such that all new messages are returned as part of a ChatResponse, rather than adding some of the messages to the chat history and then only returning the last one. But it does make cases like this one harder.

@MackinnonBuck
Copy link
Member Author

MackinnonBuck commented Jul 31, 2025

We made a very conscious choice to switch to the immutable model, such that all new messages are returned as part of a ChatResponse, rather than adding some of the messages to the chat history and then only returning the last one. But it does make cases like this one harder.

I was thinking that the summary would be appended to the chat list (e.g., via ChatResponseExtensions.AddMessages()) just like any other message, except it wouldn't be displayed in the app's UI. On the next call to GetResponseAsync(), the ReducingChatClient would construct a modified message list that excludes previously-summarized chat messages but keeps the latest summary.

@stephentoub
Copy link
Member

We made a very conscious choice to switch to the immutable model, such that all new messages are returned as part of a ChatResponse, rather than adding some of the messages to the chat history and then only returning the last one. But it does make cases like this one harder.

I was thinking that the summary would be appended to the chat list (e.g., via ChatResponseExtensions.AddMessages()) just like any other message, except it wouldn't be displayed in the app's UI. On the next call to GetResponseAsync(), the ReducingChatClient would construct a modified message list that excludes previously-summarized chat messages but keeps the latest summary.

Interesting. Worth pursuing. Devil will probably be in the details for how these things are represented in a way that makes them distinct. And will probably need some helpers that would enable a developer to clean up the resulting chat history.

@crickman
Copy link
Contributor

crickman commented Jul 31, 2025 via email

@MackinnonBuck
Copy link
Member Author

I just added a prototype implementation of a summarizing reducer that works by embedding summaries in the AdditionalProperties of chat messages. If a chat message has a summary in its AdditionalProperties, that summary encapsulates the entire conversation leading up to and including that message. This enables the following:

  • Incremental summary generation: New summaries are able to be generated by finding the last generated summary and producing a new summary that incorporates newer chat messages.
  • Conversation preservation: For apps that use the message list as the "source of truth" for a chat UI (as the chat template does), a destructive replacement of the chat history might not be ideal. The prototyped approach allows preserving chat history without sacrificing incremental summary generation.
  • Summary trail: If a chat application supports editing and re-submitting a previous message (i.e., "forking" the conversation), the summarizer would automatically use the latest summary available from that point in the conversation.

The implementation I've put in this PR is a demonstration of this approach. Happy to consider alternatives. If we do pursue this implementation strategy, there's more I'll need to follow-up on:

  • Helpers to clean up the chat history (replacing the existing chat history with a summarized one, removing summary metadata, etc.)
  • Summarizer configuration, such as customizing the system prompt
  • Integration tests
  • Incorporation into the chat template, if we decide to do that

@MackinnonBuck
Copy link
Member Author

These are now done:

  • Helpers to clean up the chat history
    • You can do this by using an IChatReducer "directly" instead of adding it to the middleware pipeline
  • Customization of the summarization prompt in SummarizingChatReducer
  • Integration tests

Follow-ups to consider:

  • Integration with the AI Chat Web template. I've done this locally, but we need to decide if this is what we want by default.
  • Which other IChatReducer implementations we should support out of the box.

@MackinnonBuck
Copy link
Member Author

I want to make sure we're factoring in experiences with implementing similar concepts in SK and as part of various agentic libraries.

On the topic of agentic libraries: I looked into this, and the proposed abstractions and chat summarizer implementation in this PR seem like a natural fit. If multiple agents are participating in the same conversation, it may make more sense to use an IChatReducer directly to reduce the shared conversation, rather than having each agent use an ReducingChatClient that performs its own reduction within its middleware pipeline.

That said, if the goal is for each agent to handle its own reduction (or opt out entirely) the middleware approach should still work, since the message list isn't destructively mutated. The one caveat is with SummarizingChatReducer, which mutates ChatMessage.AdditionalProperties to store summaries between requests. If multiple chat clients each use their own SummarizingChatReducer with different configurations (e.g., different summarization instructions), they could end up stepping over each other, because the last summary in the message list becomes the basis for the next one.

If that scenario is a concern, we could consider making the AdditionalProperties key configurable in SummarizingChatReducer. That would allow multiple summaries to coexist in the same message list, where each one is tailored to the needs of a specific agent.

@jeffhandley jeffhandley merged commit e37ad8d into main Aug 12, 2025
6 checks passed
@jeffhandley jeffhandley deleted the mbuck/chat-reducer-middleware branch August 12, 2025 17:10
@jeffhandley
Copy link
Member

/backport to release/9.8

@github-actions
Copy link
Contributor

Started backporting to release/9.8: https://github.com/dotnet/extensions/actions/runs/16915783371

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AI] Middleware for reducing chat history

5 participants