Skip to content

fix(gateway): propagate max_tokens from config.yaml to AIAgent#20804

Closed
ViewWay wants to merge 2 commits into
NousResearch:mainfrom
ViewWay:fix/max-tokens-config-propagation
Closed

fix(gateway): propagate max_tokens from config.yaml to AIAgent#20804
ViewWay wants to merge 2 commits into
NousResearch:mainfrom
ViewWay:fix/max-tokens-config-propagation

Conversation

@ViewWay

@ViewWay ViewWay commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

max_tokens set under model: in config.yaml was never read or propagated to AIAgent.__init__(). On providers without a hardcoded default (Ollama Cloud, custom endpoints), the parameter was omitted entirely, causing truncated responses with finish_reason="length".

Changes

Four surgical additions — one new key in each of three runtime dicts, plus the config read:

Location Change
_resolve_runtime_agent_kwargs() Read model.max_tokens from config, add to return dict
_resolve_session_agent_runtime() override path Add max_tokens to override_runtime
_resolve_turn_agent_config() Pass through max_tokens from runtime_kwargs

No behavioral change when max_tokens is unset (remains None → model default).

Testing

  • Syntax verified with py_compile.

Closes #20741

max_tokens set under model: in config.yaml was silently ignored.
The value was never read from config, never passed through
_resolve_runtime_agent_kwargs(), _resolve_turn_agent_config(),
or the session override path.  Added it to all three code paths
so custom/Ollama endpoints receive the correct output cap.

Closes NousResearch#20741
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles labels May 6, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #20769 — both PRs fix the same issue (#20741: max_tokens from config.yaml not propagated to AIAgent). PR #20769 is more comprehensive (also covers CLI path + env var override).

…r override

Previous commit only covered the gateway runtime path. This adds:
- CLI __init__: read max_tokens from model config with HERMES_MAX_TOKENS env override
- CLI AIAgent() calls (interactive + background): pass max_tokens
- Gateway _resolve_runtime_agent_kwargs: add HERMES_MAX_TOKENS env override

All three code paths (CLI, gateway runtime, session override) now
consistently propagate max_tokens to AIAgent.
@teknium1

teknium1 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Merged via #39864 — your two commits were cherry-picked onto current main with your authorship preserved in git log (cf78659, 1c909e7). Thanks for the fix! We also widened it so a custom_providers entry can pin its own max_output_tokens (the global model.max_tokens still wins when both are set), and added regression tests for the precedence chain.

#39864

@teknium1 teknium1 closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: max_tokens from config.yaml is silently ignored — never propagated to AIAgent, causing output truncation on Ollama Cloud / zai / custom endpoints

3 participants