fix(gateway): propagate max_tokens from config.yaml to AIAgent (#20741) by teknium1 · Pull Request #39864 · NousResearch/hermes-agent

teknium1 · 2026-06-05T14:04:12Z

Summary

Setting max_tokens in config.yaml now actually caps model output for gateway-spawned agents — fixing the "Response truncated due to output length limit" reports (#20741). Previously the value was read but never propagated to AIAgent, so providers without a hardcoded default (OpenRouter :free models, Ollama Cloud, custom OpenAI-compatible endpoints) fell back to the server's short default and truncated long generations.

Salvages @ViewWay's #20804 (cherry-picked, authorship preserved) and widens it to the per-provider config surface that #19782 (@alexcam1901) targeted.

Changes

cli.py / gateway/run.py (ViewWay): read model.max_tokens and pass it to AIAgent across CLI init, CLI background, gateway runtime, and session-override paths. HERMES_MAX_TOKENS env var as the internal override mechanism (config.yaml stays the documented surface).
hermes_cli/runtime_provider.py (widening): a custom_providers entry can pin its own cap via max_output_tokens (or max_tokens). Lifted onto the resolved runtime at all three _get_named_custom_provider return sites + the pooled-credential path.
gateway/run.py (widening): per-provider cap is used only when the documented global model.max_tokens isn't set, so the global key always wins.
scripts/release.py: AUTHOR_MAP entry for ViewWay.
tests/gateway/test_max_tokens_propagation.py: regression tests for the full precedence chain.

Precedence

HERMES_MAX_TOKENS > model.max_tokens > per-provider max_output_tokens > None

Validation

Scenario	Result
Top-level `model.max_tokens: 16384`	propagated (16384)
Per-provider `max_output_tokens: 12000`, no global	fallback (12000)
Both set	global wins (16384)
`HERMES_MAX_TOKENS=2048` + both	env wins (2048)
Nothing set	`None` (no spurious cap)

E2E verified through the real _resolve_runtime_agent_kwargs() with isolated HERMES_HOME. Targeted suite: 133 passed (test_max_tokens_propagation + test_runtime_provider_resolution).

Scope note

The CLI honors the documented global model.max_tokens fully. Per-provider max_output_tokens on the CLI path is not wired here — it requires capturing the resolved-provider dict during cli.py init (a >1500-line critical file), and #35518 is reworking that surface wholesale. Gateway is the path the truncation reports come from.

Closes #20741. Supersedes #20804, #19782.

Co-authored-by: ViewWay 834740219@qq.com

Infographic

max_tokens set under model: in config.yaml was silently ignored. The value was never read from config, never passed through _resolve_runtime_agent_kwargs(), _resolve_turn_agent_config(), or the session override path. Added it to all three code paths so custom/Ollama endpoints receive the correct output cap. Closes #20741

…r override Previous commit only covered the gateway runtime path. This adds: - CLI __init__: read max_tokens from model config with HERMES_MAX_TOKENS env override - CLI AIAgent() calls (interactive + background): pass max_tokens - Gateway _resolve_runtime_agent_kwargs: add HERMES_MAX_TOKENS env override All three code paths (CLI, gateway runtime, session override) now consistently propagate max_tokens to AIAgent.

Widens ViewWay's #20741 fix to the sibling config surface: a custom_providers entry can pin its own output cap via max_output_tokens (or max_tokens). _get_named_custom_provider now lifts it onto the resolved runtime at all three return sites, and the gateway uses it as a fallback only when the documented global model.max_tokens isn't set, so the global key always wins. Precedence: HERMES_MAX_TOKENS > model.max_tokens > provider max_output_tokens > None. Closes the same #20741 truncation for users who configure the cap per-provider rather than globally. Picks up the intent of #19782 (alexcam1901), reimplemented to feed ViewWay's max_tokens pipeline.

)

github-actions · 2026-06-05T14:05:13Z

🔎 Lint report: `hermes/hermes-56ef86e0` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9869 on HEAD, 9868 on base (🆕 +1)

🆕 New issues (1):

Rule	Count
`unresolved-import`	1

First entries

tests/gateway/test_max_tokens_propagation.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 5116 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

alt-glitch · 2026-06-05T14:39:38Z

Part of the max_tokens-propagation cluster — overlaps with open PRs #20769, #20804 (salvaged here), and #35518, all addressing #20741 (config.yaml max_tokens never reaching AIAgent). Flagging for maintainer dedup.

ViewWay and others added 5 commits June 5, 2026 07:00

test(gateway): regression tests for max_tokens propagation chain (#20741

1a4580d

)

chore(release): map ViewWay author email for AUTHOR_MAP

41495cf

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles labels Jun 5, 2026

teknium1 merged commit e837856 into main Jun 5, 2026
23 checks passed

teknium1 deleted the hermes/hermes-56ef86e0 branch June 5, 2026 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): propagate max_tokens from config.yaml to AIAgent (#20741)#39864

fix(gateway): propagate max_tokens from config.yaml to AIAgent (#20741)#39864
teknium1 merged 5 commits into
mainfrom
hermes/hermes-56ef86e0

teknium1 commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

alt-glitch commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Jun 5, 2026

Summary

Changes

Precedence

Validation

Scope note

Infographic

Uh oh!

github-actions Bot commented Jun 5, 2026

🔎 Lint report: hermes/hermes-56ef86e0 vs origin/main

ruff

ty (type checker)

Uh oh!

alt-glitch commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔎 Lint report: `hermes/hermes-56ef86e0` vs `origin/main`