Problem or Use Case
When the configured model is a router id (openrouter/auto, :free-suffixed names, or a model fallback chain), the router can pick a different backend for each request, and each backend has a different context window. Hermes budgets compression once, against the configured model id. So when the router serves a smaller-window backend, the agent can exceed the real limit before compaction triggers. When it serves a larger-window backend, Hermes compacts earlier than necessary and wastes usable context. There is currently no way for the compaction budget to follow the backend that actually served a given call.
Proposed Solution
Read the model field from each successful API response (response.model). When the served backend changes, look up that backend's context_length and re-budget the compressor thresholds (trigger point, tail budget, summary cap) to the real window. Gate it behind a config key, compression.adaptive_context_window, default false, so it is a no-op for anyone who has not opted in. Cost is at most one model_metadata lookup per backend transition, and lookups are cached on disk, so repeated transitions to the same backend are free.
Alternatives Considered
Static per-model config overrides handle the case where you know the backend ahead of time, and manual re-probes cover explicit events. Several open PRs already take those routes: #24495 (per-model context_length and provider_routing overrides), #37548 (respect model.context_length config), #36199 (persist resolved context_length after /model switch), #31067 (reinitialize the compressor on /new), and #31492 (re-probe on session reset). None of them can see a silent per-request router swap, because config does not know which backend the router chose and there is no explicit switch event to hook. The only signal of what actually served a call is response.model. This is meant to be complementary to those PRs, not a replacement: config sets your intended defaults, this corrects the budget at runtime when the router ignores them.
Feature Type
Performance / reliability
Scope
Medium (few files, < 300 lines)
Contribution
Debug Report (optional)
Problem or Use Case
When the configured model is a router id (
openrouter/auto,:free-suffixed names, or a model fallback chain), the router can pick a different backend for each request, and each backend has a different context window. Hermes budgets compression once, against the configured model id. So when the router serves a smaller-window backend, the agent can exceed the real limit before compaction triggers. When it serves a larger-window backend, Hermes compacts earlier than necessary and wastes usable context. There is currently no way for the compaction budget to follow the backend that actually served a given call.Proposed Solution
Read the
modelfield from each successful API response (response.model). When the served backend changes, look up that backend'scontext_lengthand re-budget the compressor thresholds (trigger point, tail budget, summary cap) to the real window. Gate it behind a config key,compression.adaptive_context_window, default false, so it is a no-op for anyone who has not opted in. Cost is at most onemodel_metadatalookup per backend transition, and lookups are cached on disk, so repeated transitions to the same backend are free.Alternatives Considered
Static per-model config overrides handle the case where you know the backend ahead of time, and manual re-probes cover explicit events. Several open PRs already take those routes: #24495 (per-model
context_lengthandprovider_routingoverrides), #37548 (respectmodel.context_lengthconfig), #36199 (persist resolvedcontext_lengthafter/modelswitch), #31067 (reinitialize the compressor on/new), and #31492 (re-probe on session reset). None of them can see a silent per-request router swap, because config does not know which backend the router chose and there is no explicit switch event to hook. The only signal of what actually served a call isresponse.model. This is meant to be complementary to those PRs, not a replacement: config sets your intended defaults, this corrects the budget at runtime when the router ignores them.Feature Type
Performance / reliability
Scope
Medium (few files, < 300 lines)
Contribution
Debug Report (optional)