Skip to content

feat(config): per-model context_length and provider_routing overrides#24495

Open
samplesabotage wants to merge 1 commit into
NousResearch:mainfrom
samplesabotage:feat/per-model-config-overrides
Open

feat(config): per-model context_length and provider_routing overrides#24495
samplesabotage wants to merge 1 commit into
NousResearch:mainfrom
samplesabotage:feat/per-model-config-overrides

Conversation

@samplesabotage

Copy link
Copy Markdown

What does this PR do?

Adds two opt-in, fully backward-compatible overlay schemas that let a single config declare model-specific profile-globals without mutating the flat defaults other models inherit:

  • model.models.<id>.context_length — wins over flat model.context_length when the active model is <id>. Resolution order: per-model override → flat → custom_providers per-model → auto-detect.
  • provider_routing.models.<id>.<key> — wins over flat provider_routing.<key> when the active model is <id>. Unspecified per-model keys fall through to flat defaults.

Why this approach

On OpenRouter, some providers serve a given model with a smaller context window than its native size — e.g. certain providers ship Kimi K2.6 with a 32K window despite the model's native 256K. The workaround is two profile-global settings travelling together:

  1. provider_routing.only: [...] to pin providers that serve the full window.
  2. model.context_length: <native> so the harness's expectation matches.

Both are flat in the current schema, so switching the active model without flipping both yields sporadic tool-call failures when the harness's context check runs against the larger flat default. With this patch the two settings travel together per model:

model:
  default: moonshotai/kimi-k2.6
  context_length: 128000           # default for unmatched models
  models:
    moonshotai/kimi-k2.6:
      context_length: 256000

provider_routing:
  sort: throughput
  models:
    moonshotai/kimi-k2.6:
      only: ["together", "groq"]

The overlay shape is consistent with precedent already in the codebase: model.custom_providers.<name>.models.<id>.context_length and providers.<name>.models.<id>.timeout_seconds.

Related Issues and PRs

Fixes #24493.

Also searched open issues + PRs; this PR is related to the following but not a duplicate of any:

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)

Changes Made

  • run_agent.py: per-model model.models.<id>.context_length resolved before flat model.context_length; existing custom_providers branch and warning behavior preserved.
  • cli.py: per-model provider_routing.models.<id>.* overlay on top of flat keys; unspecified per-model keys fall through.
  • cli-config.yaml.example: schema docs for both new overlays with motivating examples.
  • tests/run_agent/test_per_model_context_length.py (new): 6 tests covering per-model wins, flat fallback, invalid-value warns + falls back, missing models: no-op, string-int parsing.
  • tests/cli/test_cli_provider_resolution.py: 4 new tests covering per-model overlay wins, no-leak to other models, empty-list honored, no-overlay no-op.

How to Test

  1. Apply the patch.
  2. Write a config with both flat and per-model entries (see example above).
  3. Run pytest tests/run_agent/test_per_model_context_length.py tests/cli/test_cli_provider_resolution.py -q — all 10 new tests pass.
  4. Regression sweep: pytest tests/run_agent/test_invalid_context_length_warning.py tests/run_agent/test_switch_model_context.py tests/run_agent/test_compression_feasibility.py tests/cli/test_cli_provider_resolution.py tests/agent/test_model_metadata.py -q — all 146 tests pass (10 new + 136 pre-existing).
  5. End-to-end: start Hermes with a config whose active model has a per-model context_length override, and verify via /info (or equivalent) that the effective context_length reflects the override, not the flat default.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (feat(config): ...)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this feature
  • I've run pytest tests/ -q on the touched modules and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: Debian 13 (Linux 6.12)

Documentation & Housekeeping

  • I've updated relevant documentation (cli-config.yaml.example)
  • I've updated cli-config.yaml.example for the new config keys
  • N/A — no architecture/workflow changes
  • N/A — pure config schema addition, no platform-specific code
  • N/A — no tool description/schema changes

Scope note

Both overlays resolve at agent-init time, so the active value is correct from container/process start. Mid-session /model switching does not currently re-resolve these overlays (would require touching the switch_model path) — happy to follow up if maintainers want it.

Add two overlay schemas that let a single config declare model-specific
profile-globals without mutating the flat defaults other models inherit:

* `model.models.<id>.context_length` — wins over flat `model.context_length`
  when the active model is <id>. Resolution order is now: per-model override
  → flat → custom_providers per-model → auto-detect.

* `provider_routing.models.<id>.<key>` — wins over flat
  `provider_routing.<key>` when the active model is <id>. Unspecified
  per-model keys fall through to flat defaults, so callers can override
  e.g. just `only:` for one model without restating the rest.

Motivating case: on OpenRouter, some providers serve a given model with
a smaller context window than the model's native size. Pinning providers
via `provider_routing.only` fixes routing, and a matching
`context_length` override pins the harness's expectations — but today
both settings are profile-global, so switching the active model without
flipping both yields sporadic tool-call failures when the harness's
context check runs against the larger flat default. With this patch the
two settings travel together per model.

Schema precedent already exists:
`model.custom_providers.<name>.models.<id>.context_length` and
`providers.<name>.models.<id>.timeout_seconds`. These overlays follow
the same pattern, scoped to keys that are otherwise flat.

Tests: 10 new (6 context_length, 4 provider_routing); 129 pre-existing
tests across the touched modules continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles provider/openrouter OpenRouter aggregator labels May 12, 2026
@digitalbase

Copy link
Copy Markdown

Good stuff but this is for context_length. I bumped into the same issue with max_tokens and filed a PR #29705

@iamfoz

iamfoz commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Heads-up: I've opened #37720 for the per-request router-swap case. Your config and provider_routing overrides set the intended context length up front. Mine reads response.model after each call and re-budgets the compressor when a router (openrouter/auto, :free, fallback chains) silently serves a different backend than configured. Meant as complementary to this, not a competitor, so flagging it to avoid duplicated effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists provider/openrouter OpenRouter aggregator type/feature New feature or request

Projects

None yet

4 participants