Skip to content

fix(config): propagate max_tokens from config.yaml to AI transport (#20741)#20769

Closed
Beandon13 wants to merge 1 commit into
NousResearch:mainfrom
Beandon13:fix/hermes-20741-max-tokens-config
Closed

fix(config): propagate max_tokens from config.yaml to AI transport (#20741)#20769
Beandon13 wants to merge 1 commit into
NousResearch:mainfrom
Beandon13:fix/hermes-20741-max-tokens-config

Conversation

@Beandon13

Copy link
Copy Markdown
Contributor

Summary

Fixes #20741. The model.max_tokens key documented in cli-config.yaml.example was silently ignored — never read from config and never forwarded to AIAgent.__init__() or ChatCompletionsTransport.build_kwargs(). On providers without a hardcoded output cap (Ollama Cloud, zai, custom OpenAI-compatible endpoints) the parameter was omitted from the API call entirely, causing finish_reason="length" truncation.

Three independent gaps fixed:

  • CLI path (cli.py): HermesCLI.__init__ now reads model.max_tokens from CLI_CONFIG (with HERMES_MAX_TOKENS env-var override) and stores it as self.max_tokens. Both the interactive agent and the background agent AIAgent(...) construction sites pass max_tokens=self.max_tokens.

  • Gateway path (gateway/run.py): new _resolve_config_max_tokens() helper reads model.max_tokens from ~/.hermes/config.yaml (same env-var override). _resolve_runtime_agent_kwargs() includes the value; _resolve_turn_agent_config() copies it into the runtime dict so all six AIAgent(...) sites receive it via **turn_route["runtime"] or **runtime_kwargs.

  • Transport layer (ChatCompletionsTransport.build_kwargs): already honours max_tokens when non-None — no changes needed.

Test proof

$ python -m pytest tests/test_max_tokens_config_propagation.py -v --override-ini="addopts="
...
tests/test_max_tokens_config_propagation.py::TestResolveConfigMaxTokens::test_returns_none_when_unset PASSED
tests/test_max_tokens_config_propagation.py::TestResolveConfigMaxTokens::test_reads_max_tokens_from_model_section PASSED
tests/test_max_tokens_config_propagation.py::TestResolveConfigMaxTokens::test_env_var_overrides_config PASSED
tests/test_max_tokens_config_propagation.py::TestResolveConfigMaxTokens::test_env_var_works_without_config_file PASSED
tests/test_max_tokens_config_propagation.py::TestResolveConfigMaxTokens::test_invalid_env_var_returns_none PASSED
tests/test_max_tokens_config_propagation.py::TestResolveRuntimeAgentKwargsMaxTokens::test_max_tokens_included_when_configured PASSED
tests/test_max_tokens_config_propagation.py::TestResolveRuntimeAgentKwargsMaxTokens::test_max_tokens_none_when_not_configured PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_reads_from_model_section PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_env_var_wins_over_config PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_none_when_unset_everywhere PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_invalid_env_var_returns_none PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_integer_coercion_from_string_in_config PASSED
tests/test_max_tokens_config_propagation.py::TestCliMaxTokensLogic::test_string_model_section_returns_none PASSED

13 passed, 1 warning in 0.85s

Files changed

File Change
cli.py Read model.max_tokens / HERMES_MAX_TOKENS; store as self.max_tokens; pass to both AIAgent sites
gateway/run.py Add _resolve_config_max_tokens(); include max_tokens in _resolve_runtime_agent_kwargs() and _resolve_turn_agent_config()
tests/test_max_tokens_config_propagation.py 13 new tests covering all three layers
CHANGELOG.md Unreleased entry

🤖 Generated with Claude Code

…ousResearch#20741)

model.max_tokens documented in cli-config.yaml.example was silently ignored —
never read from config and never forwarded to AIAgent.__init__() or
ChatCompletionsTransport.build_kwargs(). On providers without a hardcoded
output cap (Ollama Cloud, zai, custom OpenAI-compatible endpoints) the
parameter was omitted from the API call entirely, causing truncation.

Three gaps fixed:

1. cli.py: HermesCLI.__init__ now reads model.max_tokens from CLI_CONFIG
   (falling back to HERMES_MAX_TOKENS env var) and stores it as self.max_tokens.
   Both the interactive and background AIAgent construction sites pass
   max_tokens=self.max_tokens.

2. gateway/run.py: new _resolve_config_max_tokens() helper reads
   model.max_tokens from ~/.hermes/config.yaml (with HERMES_MAX_TOKENS env-var
   override). _resolve_runtime_agent_kwargs() returns the value; and
   _resolve_turn_agent_config() copies it into the runtime dict so all six
   AIAgent(...) call sites receive it via **turn_route["runtime"] or
   **runtime_kwargs.

3. ChatCompletionsTransport.build_kwargs() already honours max_tokens when
   non-None — no changes needed there.

Also adds tests/test_max_tokens_config_propagation.py with 13 tests covering
_resolve_config_max_tokens, _resolve_runtime_agent_kwargs, and the CLI
max_tokens resolution logic (all 13 pass).

Fixes: NousResearch#20741

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles labels May 6, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Automated hermes-sweeper review: this PR's fix is already implemented on main via the merged #39864 path.

Evidence:

Thanks for the original fix and test work here. The main-branch implementation preserves the same core behavior and widens it to the per-provider cap path as well.

@teknium1 teknium1 closed this Jun 11, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: max_tokens from config.yaml is silently ignored — never propagated to AIAgent, causing output truncation on Ollama Cloud / zai / custom endpoints

3 participants