Skip to content

feat(agent): re-budget context compressor when a router swaps the backend#37720

Open
iamfoz wants to merge 1 commit into
NousResearch:mainfrom
iamfoz:feat/adaptive-context-window
Open

feat(agent): re-budget context compressor when a router swaps the backend#37720
iamfoz wants to merge 1 commit into
NousResearch:mainfrom
iamfoz:feat/adaptive-context-window

Conversation

@iamfoz

@iamfoz iamfoz commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Routers like openrouter/auto, :free-suffixed names, and model fallback chains silently pick a different backend per request, and each backend has a different context window. Today Hermes budgets compression once, against the configured model id, so when the router serves a smaller-window backend the agent can blow past the real limit before compaction kicks in, or compact far too early on a larger one.

The existing context-length work all re-budgets on an explicit, known-ahead event: static config overrides (#24495, #37548), or a manual trigger like /model (#36199), /new (#31067), or session reset (#31492). None of them can see a silent per-request swap, because the only signal of which backend actually served a call is the model field on the response.

This PR adds an AdaptiveContextTracker that reads response.model after each successful call. When the backend changes, it looks up the new backend's context_length and re-budgets the compressor via ContextCompressor.update_model() to match the real window.

It is complementary to #24495, not competing. Config overrides set your defaults; this corrects the budget when the router ignores them at runtime.

The whole feature is gated behind compression.adaptive_context_window (default false), so it is a guaranteed no-op for everyone who has not opted in. The observe hook is defensive: any exception in the adaptive path is logged at debug and swallowed, so the agent loop is never interrupted by it.

Related Issue

Fixes #37719

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)

Changes Made

  • agent/adaptive_context.py (new): AdaptiveContextTracker. observe(response_model) returns the new backend id on a change, else None, plus last_seen(), change_count(), summary(). Stateful, no I/O.
  • agent/agent_init.py: instantiate and attach the tracker when compression.adaptive_context_window is enabled, otherwise leave it unset.
  • agent/conversation_loop.py: after each successful API call, feed response.model to the tracker. On a change, resolve the backend's context_length via agent.model_metadata and call ContextCompressor.update_model(). Wrapped in a defensive try/except (debug-logged, swallowed).
  • cli.py and gateway/run.py: /usage surfaces live router-tracking state (baseline backend, change count, current backend) when the feature is enabled.
  • cli-config.yaml.example: documents the new compression.adaptive_context_window key (default false).
  • locales/en.yaml: strings for the /usage tracking summary.
  • tests/agent/test_adaptive_context.py (new): 10 tests.

How to Test

Automated:

pytest tests/agent/test_adaptive_context.py -q

Covers first-observation baseline, no-op on same model, change detection with counter increment, multiple and router-to-concrete transitions, invalid-input guards, and the /usage summary shape.

Manual (real router):

  1. In ~/.hermes/config.yaml set compression.adaptive_context_window: true and point the model at a router such as openrouter/auto.
  2. Run a session long enough that the router serves more than one backend.
  3. /usage shows the tracked baseline backend and a non-zero change count once the backend swaps. Compression thresholds re-budget to the new backend's window, visible as a changed compaction point in debug logs.

No-op check (proves zero impact when disabled):
4. Leave adaptive_context_window unset or false and confirm behaviour matches main. The tracker is never instantiated.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs to make sure this isn't a duplicate (closest is feat(config): per-model context_length and provider_routing overrides #24495, config overrides; this handles the per-request router-swap case it cannot see)
  • My PR contains only changes related to this feature
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS Tahoe 26.5 (Mac Mini M2, production)

Documentation & Housekeeping

  • Documentation updated (config key in cli-config.yaml.example, /usage strings in locales/en.yaml)
  • Updated cli-config.yaml.example for the new config key
  • N/A, no architecture or workflow change
  • Cross-platform: pure Python, no platform-specific calls
  • N/A, no tool description or schema changes

…et compressor

Re-anchored to agent/adaptive_context.py + agent/agent_init.py +
agent/conversation_loop.py after upstream's refactor.

Adds AdaptiveContextTracker (new module) and wires it into both
init (config flag gated) and the response handling path. After each
successful API call, if response.model differs from the previously
observed value, looks up the new backend's context_length via
agent.model_metadata and calls ContextCompressor.update_model() to
rebudget thresholds.

Behaviour change is gated on compression.adaptive_context_window in
cli-config.yaml (default false), so this is a no-op for everyone who
hasn't opted in. The observe hook is defensive: any exception in the
adaptive path is logged at debug and swallowed so the agent loop is
never interrupted by it.

Helps users whose configured model is a router id (openrouter/auto,
:free-suffix variants, model fallback chains) where the actual
backend selected per request - and its context window - varies.

CLI /usage and gateway /usage both surface live router-tracking state
when the feature is enabled, including baseline/changes summary.
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Re-budget the context compressor when a router serves a different backend per request

2 participants