feat(agent): re-budget context compressor when a router swaps the backend#37720
Open
iamfoz wants to merge 1 commit into
Open
feat(agent): re-budget context compressor when a router swaps the backend#37720iamfoz wants to merge 1 commit into
iamfoz wants to merge 1 commit into
Conversation
…et compressor Re-anchored to agent/adaptive_context.py + agent/agent_init.py + agent/conversation_loop.py after upstream's refactor. Adds AdaptiveContextTracker (new module) and wires it into both init (config flag gated) and the response handling path. After each successful API call, if response.model differs from the previously observed value, looks up the new backend's context_length via agent.model_metadata and calls ContextCompressor.update_model() to rebudget thresholds. Behaviour change is gated on compression.adaptive_context_window in cli-config.yaml (default false), so this is a no-op for everyone who hasn't opted in. The observe hook is defensive: any exception in the adaptive path is logged at debug and swallowed so the agent loop is never interrupted by it. Helps users whose configured model is a router id (openrouter/auto, :free-suffix variants, model fallback chains) where the actual backend selected per request - and its context window - varies. CLI /usage and gateway /usage both surface live router-tracking state when the feature is enabled, including baseline/changes summary.
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Routers like
openrouter/auto,:free-suffixed names, and model fallback chains silently pick a different backend per request, and each backend has a different context window. Today Hermes budgets compression once, against the configured model id, so when the router serves a smaller-window backend the agent can blow past the real limit before compaction kicks in, or compact far too early on a larger one.The existing context-length work all re-budgets on an explicit, known-ahead event: static config overrides (#24495, #37548), or a manual trigger like
/model(#36199),/new(#31067), or session reset (#31492). None of them can see a silent per-request swap, because the only signal of which backend actually served a call is themodelfield on the response.This PR adds an
AdaptiveContextTrackerthat readsresponse.modelafter each successful call. When the backend changes, it looks up the new backend'scontext_lengthand re-budgets the compressor viaContextCompressor.update_model()to match the real window.It is complementary to #24495, not competing. Config overrides set your defaults; this corrects the budget when the router ignores them at runtime.
The whole feature is gated behind
compression.adaptive_context_window(default false), so it is a guaranteed no-op for everyone who has not opted in. The observe hook is defensive: any exception in the adaptive path is logged at debug and swallowed, so the agent loop is never interrupted by it.Related Issue
Fixes #37719
Type of Change
Changes Made
agent/adaptive_context.py(new):AdaptiveContextTracker.observe(response_model)returns the new backend id on a change, elseNone, pluslast_seen(),change_count(),summary(). Stateful, no I/O.agent/agent_init.py: instantiate and attach the tracker whencompression.adaptive_context_windowis enabled, otherwise leave it unset.agent/conversation_loop.py: after each successful API call, feedresponse.modelto the tracker. On a change, resolve the backend'scontext_lengthviaagent.model_metadataand callContextCompressor.update_model(). Wrapped in a defensive try/except (debug-logged, swallowed).cli.pyandgateway/run.py:/usagesurfaces live router-tracking state (baseline backend, change count, current backend) when the feature is enabled.cli-config.yaml.example: documents the newcompression.adaptive_context_windowkey (default false).locales/en.yaml: strings for the/usagetracking summary.tests/agent/test_adaptive_context.py(new): 10 tests.How to Test
Automated:
Covers first-observation baseline, no-op on same model, change detection with counter increment, multiple and router-to-concrete transitions, invalid-input guards, and the
/usagesummary shape.Manual (real router):
~/.hermes/config.yamlsetcompression.adaptive_context_window: trueand point the model at a router such asopenrouter/auto./usageshows the tracked baseline backend and a non-zero change count once the backend swaps. Compression thresholds re-budget to the new backend's window, visible as a changed compaction point in debug logs.No-op check (proves zero impact when disabled):
4. Leave
adaptive_context_windowunset or false and confirm behaviour matchesmain. The tracker is never instantiated.Checklist
Code
pytest tests/ -qand all tests passDocumentation & Housekeeping
cli-config.yaml.example,/usagestrings inlocales/en.yaml)cli-config.yaml.examplefor the new config key