Make conversation history summarization threshold public

# Problem
The `github.copilot.chat.advanced.summarizeAgentConversationHistoryThreshold` setting appears to trigger when the context window fills up. It is currently marked as `INTERNAL`, making it unreliable to users who need to optimize LLM performance. [Research and anecdotal evidence suggest that filling the entire context window can degrade LLM performance](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html), with some organizations recommending starting new chats at ~40% context utilization.

I've found that performance degrades, sometimes significantly, when I am filling up the context window. This solution would mean I would summarize data more often, but also might improve both quality and latency. 

# Proposed Solution

Make the setting public and enhance it with three configuration modes:

**1. Hard token number (existing)**
```json
{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": 80000
}
```

**2. Percentage-based threshold**
```json
{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": "40%"
}
```
For a model with 200K max prompt tokens, this would trigger at 80K tokens.

**3. Per-model configuration**
```json
{
  "github.copilot.chat.summarizeAgentConversationHistoryThreshold": {
    "gpt-4o": "40%",
    "claude-3.5-sonnet": 100000,
    "default": "50%"
  }
}
```

(Note: the above shows three different data types for that key. Having only one key makes it easy to find, but makes the code more complex for validation. Having three keys (such as `github.copilot.chat.summarizeAgentConversationHistoryThresholdPercent`) would make them harder to find, but easier for the code)

### Rationale
- Different models have different performance characteristics at various context utilization levels
- Users should be able to optimize for quality vs. context retention based on their use case
- Percentage-based configuration is more portable across models with different context windows

### Implementation Notes
The existing infrastructure in `src/platform/endpoint/node/chatEndpoint.ts` already provides `modelMaxPromptTokens`, which can be used to calculate percentage-based thresholds. The configuration system in `src/platform/configuration/common/configurationService.ts` supports complex types that could accommodate this schema.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make conversation history summarization threshold public #270528

Problem

Proposed Solution

Rationale

Implementation Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make conversation history summarization threshold public #270528

Description

Problem

Proposed Solution

Rationale

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions