Skip to content

Make conversation history summarization threshold public #270528

@Ovid

Description

@Ovid

Problem

The github.copilot.chat.advanced.summarizeAgentConversationHistoryThreshold setting appears to trigger when the context window fills up. It is currently marked as INTERNAL, making it unreliable to users who need to optimize LLM performance. Research and anecdotal evidence suggest that filling the entire context window can degrade LLM performance, with some organizations recommending starting new chats at ~40% context utilization.

I've found that performance degrades, sometimes significantly, when I am filling up the context window. This solution would mean I would summarize data more often, but also might improve both quality and latency.

Proposed Solution

Make the setting public and enhance it with three configuration modes:

1. Hard token number (existing)

{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": 80000
}

2. Percentage-based threshold

{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": "40%"
}

For a model with 200K max prompt tokens, this would trigger at 80K tokens.

3. Per-model configuration

{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": {
    "gpt-4o": "40%",
    "claude-3.5-sonnet": 100000,
    "default": "50%"
  }
}

(Note: the above shows three different data types for that key. Having only one key makes it easy to find, but makes the code more complex for validation. Having three keys (such as github.copilot.chat.summarizeAgentConversationHistoryThresholdPercent) would make them harder to find, but easier for the code)

Rationale

  • Different models have different performance characteristics at various context utilization levels
  • Users should be able to optimize for quality vs. context retention based on their use case
  • Percentage-based configuration is more portable across models with different context windows

Implementation Notes

The existing infrastructure in src/platform/endpoint/node/chatEndpoint.ts already provides modelMaxPromptTokens, which can be used to calculate percentage-based thresholds. The configuration system in src/platform/configuration/common/configurationService.ts supports complex types that could accommodate this schema.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions