Skip to content

[Feature]: Re-budget the context compressor when a router serves a different backend per request #37719

@iamfoz

Description

@iamfoz

Problem or Use Case

When the configured model is a router id (openrouter/auto, :free-suffixed names, or a model fallback chain), the router can pick a different backend for each request, and each backend has a different context window. Hermes budgets compression once, against the configured model id. So when the router serves a smaller-window backend, the agent can exceed the real limit before compaction triggers. When it serves a larger-window backend, Hermes compacts earlier than necessary and wastes usable context. There is currently no way for the compaction budget to follow the backend that actually served a given call.

Proposed Solution

Read the model field from each successful API response (response.model). When the served backend changes, look up that backend's context_length and re-budget the compressor thresholds (trigger point, tail budget, summary cap) to the real window. Gate it behind a config key, compression.adaptive_context_window, default false, so it is a no-op for anyone who has not opted in. Cost is at most one model_metadata lookup per backend transition, and lookups are cached on disk, so repeated transitions to the same backend are free.

Alternatives Considered

Static per-model config overrides handle the case where you know the backend ahead of time, and manual re-probes cover explicit events. Several open PRs already take those routes: #24495 (per-model context_length and provider_routing overrides), #37548 (respect model.context_length config), #36199 (persist resolved context_length after /model switch), #31067 (reinitialize the compressor on /new), and #31492 (re-probe on session reset). None of them can see a silent per-request router swap, because config does not know which backend the router chose and there is no explicit switch event to hook. The only signal of what actually served a call is response.model. This is meant to be complementary to those PRs, not a replacement: config sets your intended defaults, this corrects the budget at runtime when the router ignores them.

Feature Type

Performance / reliability

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions