Skip to content

fix(gateway): propagate max_tokens from config.yaml to AIAgent (#20741)#39864

Merged
teknium1 merged 5 commits into
mainfrom
hermes/hermes-56ef86e0
Jun 5, 2026
Merged

fix(gateway): propagate max_tokens from config.yaml to AIAgent (#20741)#39864
teknium1 merged 5 commits into
mainfrom
hermes/hermes-56ef86e0

Conversation

@teknium1

@teknium1 teknium1 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Setting max_tokens in config.yaml now actually caps model output for gateway-spawned agents — fixing the "Response truncated due to output length limit" reports (#20741). Previously the value was read but never propagated to AIAgent, so providers without a hardcoded default (OpenRouter :free models, Ollama Cloud, custom OpenAI-compatible endpoints) fell back to the server's short default and truncated long generations.

Salvages @ViewWay's #20804 (cherry-picked, authorship preserved) and widens it to the per-provider config surface that #19782 (@alexcam1901) targeted.

Changes

  • cli.py / gateway/run.py (ViewWay): read model.max_tokens and pass it to AIAgent across CLI init, CLI background, gateway runtime, and session-override paths. HERMES_MAX_TOKENS env var as the internal override mechanism (config.yaml stays the documented surface).
  • hermes_cli/runtime_provider.py (widening): a custom_providers entry can pin its own cap via max_output_tokens (or max_tokens). Lifted onto the resolved runtime at all three _get_named_custom_provider return sites + the pooled-credential path.
  • gateway/run.py (widening): per-provider cap is used only when the documented global model.max_tokens isn't set, so the global key always wins.
  • scripts/release.py: AUTHOR_MAP entry for ViewWay.
  • tests/gateway/test_max_tokens_propagation.py: regression tests for the full precedence chain.

Precedence

HERMES_MAX_TOKENS > model.max_tokens > per-provider max_output_tokens > None

Validation

Scenario Result
Top-level model.max_tokens: 16384 propagated (16384)
Per-provider max_output_tokens: 12000, no global fallback (12000)
Both set global wins (16384)
HERMES_MAX_TOKENS=2048 + both env wins (2048)
Nothing set None (no spurious cap)

E2E verified through the real _resolve_runtime_agent_kwargs() with isolated HERMES_HOME. Targeted suite: 133 passed (test_max_tokens_propagation + test_runtime_provider_resolution).

Scope note

The CLI honors the documented global model.max_tokens fully. Per-provider max_output_tokens on the CLI path is not wired here — it requires capturing the resolved-provider dict during cli.py init (a >1500-line critical file), and #35518 is reworking that surface wholesale. Gateway is the path the truncation reports come from.

Closes #20741. Supersedes #20804, #19782.

Co-authored-by: ViewWay 834740219@qq.com

Infographic

max-tokens-propagation

ViewWay and others added 5 commits June 5, 2026 07:00
max_tokens set under model: in config.yaml was silently ignored.
The value was never read from config, never passed through
_resolve_runtime_agent_kwargs(), _resolve_turn_agent_config(),
or the session override path.  Added it to all three code paths
so custom/Ollama endpoints receive the correct output cap.

Closes #20741
…r override

Previous commit only covered the gateway runtime path. This adds:
- CLI __init__: read max_tokens from model config with HERMES_MAX_TOKENS env override
- CLI AIAgent() calls (interactive + background): pass max_tokens
- Gateway _resolve_runtime_agent_kwargs: add HERMES_MAX_TOKENS env override

All three code paths (CLI, gateway runtime, session override) now
consistently propagate max_tokens to AIAgent.
Widens ViewWay's #20741 fix to the sibling config surface: a
custom_providers entry can pin its own output cap via max_output_tokens
(or max_tokens). _get_named_custom_provider now lifts it onto the
resolved runtime at all three return sites, and the gateway uses it as a
fallback only when the documented global model.max_tokens isn't set, so
the global key always wins.

Precedence: HERMES_MAX_TOKENS > model.max_tokens > provider
max_output_tokens > None. Closes the same #20741 truncation for users who
configure the cap per-provider rather than globally.

Picks up the intent of #19782 (alexcam1901), reimplemented to feed
ViewWay's max_tokens pipeline.
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-56ef86e0 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9869 on HEAD, 9868 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unresolved-import 1
First entries
tests/gateway/test_max_tokens_propagation.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 5116 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles labels Jun 5, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Part of the max_tokens-propagation cluster — overlaps with open PRs #20769, #20804 (salvaged here), and #35518, all addressing #20741 (config.yaml max_tokens never reaching AIAgent). Flagging for maintainer dedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: max_tokens from config.yaml is silently ignored — never propagated to AIAgent, causing output truncation on Ollama Cloud / zai / custom endpoints

3 participants