You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some Hermes config keys are profile-global but should logically vary by active model. The two clearest cases today:
model.context_length — used both for the harness's pre-flight context check and for compression triggers. Some OpenRouter providers serve a model with a smaller context window than its native size (e.g. certain providers ship kimi-k2.6 with 32K despite the model's native 256K — see Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268). The workaround is model.context_length: <native>, but that value is flat: it applies to whatever model is active. Switch models and the override now applies to the wrong model.
provider_routing.{only,order,ignore,sort} — used to pin OpenRouter providers. The natural fix for the case above is provider_routing.only: [<long-context providers>], but again, this is flat. A user running multiple models who only wants pinning for one of them has no clean way to express that — they have to flip the flat key every time they switch.
The result: the pair (context_length override + provider_routing.only) has to travel together when you switch the active model, and there's no way to say "this pair belongs to model X, that pair belongs to model Y" in one config. So /model switching becomes an error-prone manual coordination, and sporadic tool-call failures appear when the harness's check runs with the wrong model's expectations.
This same shape covers a class of related reports: #24140 (P1, "context window below minimum 64K"), #24000 (P2, nous fallback to 32K), #24072 (P2, model.context_length persists across /model switches).
Proposed Solution
Two opt-in, backward-compatible overlay schemas, both following the same per-model-overlay precedent already in the codebase (model.custom_providers.<name>.models.<id>.context_length, providers.<name>.models.<id>.timeout_seconds):
model:
default: moonshotai/kimi-k2.6context_length: 128000# default for unmatched modelsmodels: # NEW overlaymoonshotai/kimi-k2.6:
context_length: 256000provider_routing:
sort: throughputmodels: # NEW overlaymoonshotai/kimi-k2.6:
only: ["together", "groq"]
model.models.<id>.context_length wins over flat model.context_length when the active model is <id>.
provider_routing.models.<id>.<key> wins over flat provider_routing.<key> when the active model is <id>. Unspecified per-model keys fall through to flat defaults.
Resolution at agent-init time. Mid-session /model switching is out of scope for the first cut (left to follow-up; PR #24079 addresses the live-switch path independently).
One profile per model. Works with the current schema (set the flat keys for each profile's chosen model), but multiplies operational overhead — separate data dirs, separate Discord bot tokens or careful coordination to share one, conversation history splits per variant. Heavy for what's really "this model needs different profile-global numbers."
The overlay approach is the smallest schema change that handles all the above cases and composes naturally with future per-model keys (e.g. #15037 requests per-model max_tokens and would use the same shape).
Problem or Use Case
Some Hermes config keys are profile-global but should logically vary by active model. The two clearest cases today:
model.context_length— used both for the harness's pre-flight context check and for compression triggers. Some OpenRouter providers serve a model with a smaller context window than its native size (e.g. certain providers ship kimi-k2.6 with 32K despite the model's native 256K — see Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268). The workaround ismodel.context_length: <native>, but that value is flat: it applies to whatever model is active. Switch models and the override now applies to the wrong model.provider_routing.{only,order,ignore,sort}— used to pin OpenRouter providers. The natural fix for the case above isprovider_routing.only: [<long-context providers>], but again, this is flat. A user running multiple models who only wants pinning for one of them has no clean way to express that — they have to flip the flat key every time they switch.The result: the pair (
context_lengthoverride +provider_routing.only) has to travel together when you switch the active model, and there's no way to say "this pair belongs to model X, that pair belongs to model Y" in one config. So/modelswitching becomes an error-prone manual coordination, and sporadic tool-call failures appear when the harness's check runs with the wrong model's expectations.This same shape covers a class of related reports: #24140 (P1, "context window below minimum 64K"), #24000 (P2, nous fallback to 32K), #24072 (P2,
model.context_lengthpersists across/modelswitches).Proposed Solution
Two opt-in, backward-compatible overlay schemas, both following the same per-model-overlay precedent already in the codebase (
model.custom_providers.<name>.models.<id>.context_length,providers.<name>.models.<id>.timeout_seconds):model.models.<id>.context_lengthwins over flatmodel.context_lengthwhen the active model is<id>.provider_routing.models.<id>.<key>wins over flatprovider_routing.<key>when the active model is<id>. Unspecified per-model keys fall through to flat defaults.Resolution at agent-init time. Mid-session
/modelswitching is out of scope for the first cut (left to follow-up; PR #24079 addresses the live-switch path independently).Alternatives Considered
agent/model_metadata.py(the approach proposed in Wrong context length for kimi-k2.6 family: OpenRouter returns 32K, overrides correct hardcoded 256K default #24268). Fixes the specific kimi-k2.6 case but doesn't generalize to other future cases of the same shape, doesn't help users who hit a brand-new mis-reporting provider, and doesn't address the provider-pinning half of the problem.providers.<name>.context_length, currently being explored in feat: read context_length from providers.<name>.context_length (Step 0c) #20847). Helps when the divergence is per-provider; doesn't help when two models on the same provider need different settings.The overlay approach is the smallest schema change that handles all the above cases and composes naturally with future per-model keys (e.g. #15037 requests per-model
max_tokensand would use the same shape).Feature Type
Configuration option
Scope
Small (single file, < 50 lines)
Contribution
Debug Report (optional)