feat(config): per-model context_length and provider_routing overrides by samplesabotage · Pull Request #24495 · NousResearch/hermes-agent

samplesabotage · 2026-05-12T18:44:23Z

What does this PR do?

Adds two opt-in, fully backward-compatible overlay schemas that let a single config declare model-specific profile-globals without mutating the flat defaults other models inherit:

model.models.<id>.context_length — wins over flat model.context_length when the active model is <id>. Resolution order: per-model override → flat → custom_providers per-model → auto-detect.
provider_routing.models.<id>.<key> — wins over flat provider_routing.<key> when the active model is <id>. Unspecified per-model keys fall through to flat defaults.

Why this approach

On OpenRouter, some providers serve a given model with a smaller context window than its native size — e.g. certain providers ship Kimi K2.6 with a 32K window despite the model's native 256K. The workaround is two profile-global settings travelling together:

provider_routing.only: [...] to pin providers that serve the full window.
model.context_length: <native> so the harness's expectation matches.

Both are flat in the current schema, so switching the active model without flipping both yields sporadic tool-call failures when the harness's context check runs against the larger flat default. With this patch the two settings travel together per model:

model:
  default: moonshotai/kimi-k2.6
  context_length: 128000           # default for unmatched models
  models:
    moonshotai/kimi-k2.6:
      context_length: 256000

provider_routing:
  sort: throughput
  models:
    moonshotai/kimi-k2.6:
      only: ["together", "groq"]

The overlay shape is consistent with precedent already in the codebase: model.custom_providers.<name>.models.<id>.context_length and providers.<name>.models.<id>.timeout_seconds.

Related Issues and PRs

Fixes #24493.

Also searched open issues + PRs; this PR is related to the following but not a duplicate of any:

Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268 (P1) — Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default. This PR is a complementary fix: Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268 proposes hardcoded defaults in model_metadata.py for the specific kimi models; this PR adds a user-configurable per-model override that works for any model OpenRouter mis-reports, plus the provider-pinning angle (provider_routing.models.<id>.only) which Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268 doesn't address. Both could land independently.
All models rejected with "context window below minimum 64,000 tokens" — Telegram completely down #24140 (P1), provider: nous falls back to 32,768-token context, blocking boot with model.context_length workaround required #24000 (P2) — broader manifestations of "context window below minimum 64K" failures. This PR doesn't fix the underlying detection bugs but gives users a clean per-model workaround without polluting flat defaults.
# Bug Report: model.context_length persists across /model provider switches #24072 (P2) + companion PR fix: clear stale _config_context_length when switching providers #24079 — mid-session /model switching doesn't re-resolve _config_context_length. This is exactly the scope-note below: this PR resolves overlays at agent-init time only; fix: clear stale _config_context_length when switching providers #24079 addresses the live-switch path and is complementary, not conflicting.
fix(context): respect custom provider per-model context_length overrides #24460 (PR, open) — fixes a bug in the existing model.custom_providers.<name>.models.<id>.context_length resolution order. Different code path; no conflict with this PR's new model.models.<id> overlay.
feat(config): support per-model max_tokens in custom_providers config #15037 (P3 feature) — per-model max_tokens in custom_providers. Adjacent: the overlay pattern introduced here naturally extends to max_tokens and other model-globals as future work.

Type of Change

✨ New feature (non-breaking change that adds functionality)

Changes Made

run_agent.py: per-model model.models.<id>.context_length resolved before flat model.context_length; existing custom_providers branch and warning behavior preserved.
cli.py: per-model provider_routing.models.<id>.* overlay on top of flat keys; unspecified per-model keys fall through.
cli-config.yaml.example: schema docs for both new overlays with motivating examples.
tests/run_agent/test_per_model_context_length.py (new): 6 tests covering per-model wins, flat fallback, invalid-value warns + falls back, missing models: no-op, string-int parsing.
tests/cli/test_cli_provider_resolution.py: 4 new tests covering per-model overlay wins, no-leak to other models, empty-list honored, no-overlay no-op.

How to Test

Apply the patch.
Write a config with both flat and per-model entries (see example above).
Run pytest tests/run_agent/test_per_model_context_length.py tests/cli/test_cli_provider_resolution.py -q — all 10 new tests pass.
Regression sweep: pytest tests/run_agent/test_invalid_context_length_warning.py tests/run_agent/test_switch_model_context.py tests/run_agent/test_compression_feasibility.py tests/cli/test_cli_provider_resolution.py tests/agent/test_model_metadata.py -q — all 146 tests pass (10 new + 136 pre-existing).
End-to-end: start Hermes with a config whose active model has a per-model context_length override, and verify via /info (or equivalent) that the effective context_length reflects the override, not the flat default.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (feat(config): ...)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this feature
I've run pytest tests/ -q on the touched modules and all tests pass
I've added tests for my changes
I've tested on my platform: Debian 13 (Linux 6.12)

Documentation & Housekeeping

I've updated relevant documentation (cli-config.yaml.example)
I've updated cli-config.yaml.example for the new config keys
N/A — no architecture/workflow changes
N/A — pure config schema addition, no platform-specific code
N/A — no tool description/schema changes

Scope note

Both overlays resolve at agent-init time, so the active value is correct from container/process start. Mid-session /model switching does not currently re-resolve these overlays (would require touching the switch_model path) — happy to follow up if maintainers want it.

Add two overlay schemas that let a single config declare model-specific profile-globals without mutating the flat defaults other models inherit: * `model.models.<id>.context_length` — wins over flat `model.context_length` when the active model is <id>. Resolution order is now: per-model override → flat → custom_providers per-model → auto-detect. * `provider_routing.models.<id>.<key>` — wins over flat `provider_routing.<key>` when the active model is <id>. Unspecified per-model keys fall through to flat defaults, so callers can override e.g. just `only:` for one model without restating the rest. Motivating case: on OpenRouter, some providers serve a given model with a smaller context window than the model's native size. Pinning providers via `provider_routing.only` fixes routing, and a matching `context_length` override pins the harness's expectations — but today both settings are profile-global, so switching the active model without flipping both yields sporadic tool-call failures when the harness's context check runs against the larger flat default. With this patch the two settings travel together per model. Schema precedent already exists: `model.custom_providers.<name>.models.<id>.context_length` and `providers.<name>.models.<id>.timeout_seconds`. These overlays follow the same pattern, scoped to keys that are otherwise flat. Tests: 10 new (6 context_length, 4 provider_routing); 129 pre-existing tests across the touched modules continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

digitalbase · 2026-05-21T07:28:17Z

Good stuff but this is for context_length. I bumped into the same issue with max_tokens and filed a PR #29705

iamfoz · 2026-06-02T23:38:13Z

Heads-up: I've opened #37720 for the per-request router-swap case. Your config and provider_routing overrides set the intended context length up front. Mine reads response.model after each call and re-budgets the compressor when a router (openrouter/auto, :free, fallback chains) silently serves a different backend than configured. Meant as complementary to this, not a competitor, so flagging it to avoid duplicated effort.

This was referenced May 21, 2026

fix(agent_init): read max_tokens from custom_providers per-model config #28142

Closed

feat(config): add per-model max_tokens overlay #29705

Open

This was referenced Jun 2, 2026

[Feature]: Re-budget the context compressor when a router serves a different backend per request #37719

Open

feat(agent): re-budget context compressor when a router swaps the backend #37720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): per-model context_length and provider_routing overrides#24495

feat(config): per-model context_length and provider_routing overrides#24495
samplesabotage wants to merge 1 commit into
NousResearch:mainfrom
samplesabotage:feat/per-model-config-overrides

samplesabotage commented May 12, 2026

Uh oh!

digitalbase commented May 21, 2026

Uh oh!

iamfoz commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

samplesabotage commented May 12, 2026

What does this PR do?

Why this approach

Related Issues and PRs

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Scope note

Uh oh!

digitalbase commented May 21, 2026

Uh oh!

iamfoz commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants