feat(config): add per-model max_tokens overlay by digitalbase · Pull Request #29705 · NousResearch/hermes-agent

digitalbase · 2026-05-21T07:21:36Z

If you use providers like openrouter, bedrock the single max_token setting isn't enough. It needs to be dynamic based on the model.

Summary

Adds a model.models.<id>.max_tokens overlay so a single profile can switch between models with different output-token ceilings without mutating the flat model.max_tokens fallback.

model:
  default: anthropic/claude-opus-4.6
  max_tokens: 8192              # fallback for unmatched models
  models:
    anthropic/claude-opus-4.6:
      max_tokens: 32768
    gpt-5.5:
      max_tokens: 65536

Resolution order is:

Explicit constructor/programmatic max_tokens
model.models.<active_model>.max_tokens
Flat model.max_tokens
Provider/model default

Why this is not a duplicate

Related PRs solve adjacent scopes, but not this specific config shape:

feat(config): per-model context_length and provider_routing overrides #24495 establishes the same model.models.<id> overlay pattern for context_length and provider_routing, and explicitly notes max_tokens as a natural future extension. This PR is that focused extension for output-token caps.
fix(agent_init): read max_tokens from custom_providers per-model config #28142 / feat: per-provider max_tokens via custom_providers models.<model>.max_tokens #28786 target custom_providers[].models.<model>.max_tokens, which helps custom-provider entries but not built-in providers like Bedrock/Anthropic/OpenAI or normal model switching within one profile.
fix(config): propagate max_tokens from config.yaml to AI transport (#20741) #20769 covers flat/global model.max_tokens propagation, but not per-active-model overrides.

Changes

agent/agent_init.py: resolve model.models.<active>.max_tokens before flat model.max_tokens, preserving constructor precedence and positive-int validation.
gateway/run.py: include model.models in gateway agent cache-busting keys so edits to overlays rebuild cached agents.
cli-config.yaml.example: document the per-model output-token overlay.
Tests cover override wins, unmatched fallback, invalid per-model fallback, constructor precedence, and gateway cache invalidation.

feat(config): add per-model max_tokens overlay

c9dee52

alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder area/config Config system, migrations, profiles P3 Low — cosmetic, nice to have labels May 21, 2026

digitalbase mentioned this pull request May 21, 2026

feat(config): per-model context_length and provider_routing overrides #24495

Open

13 tasks

alt-glitch mentioned this pull request May 30, 2026

feat(custom_providers): per-model max_tokens with switch/fallback re-resolution #35518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): add per-model max_tokens overlay#29705

feat(config): add per-model max_tokens overlay#29705
digitalbase wants to merge 1 commit into
NousResearch:mainfrom
digitalbase:feat/per-model-max-tokens-overlay

digitalbase commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

digitalbase commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is not a duplicate

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

digitalbase commented May 21, 2026 •

edited

Loading