feat(config): support model.max_tokens override in config.yaml by leon7609 · Pull Request #18445 · NousResearch/hermes-agent

leon7609 · 2026-05-01T15:29:53Z

What does this PR do?

Adds a config-driven max_tokens override for OpenAI-compatible custom providers, mirroring the existing model.context_length lookup.

Custom providers (vLLM, llama.cpp, ollama, ...) that do not advertise a max_tokens default via /models cause responses to truncate with finish_reason='length' because the agent never sends a max_tokens hint. The chat_completions transport leaves max_tokens unset, the server uses its own (often conservative) default, and long answers get cut mid-sentence.

This PR introduces:

A new get_max_tokens_from_config(model, base_url, config) helper in hermes_cli/config.py — alongside the existing get_custom_provider_context_length.
A wire-up in AIAgent.__init__ that calls the helper after the existing context_length resolution and applies its value to self.max_tokens only when the constructor didn't pass an explicit value.

Lookup order (first valid positive int wins):

model.max_tokens (top-level — applies to whichever model is active)
custom_providers.<>.models.<model>.max_tokens (per-model, scoped to the entry whose base_url matches)

No behaviour change for built-in providers (Anthropic / OpenAI / OpenRouter / Bedrock / Gemini) — they still resolve max_tokens through their own adapter logic.

Aligns with the schema proposed in issue #15037.

Related Issue

Fixes the user-side workaround needed for #15037.

Type of Change

✨ New feature (non-breaking change that adds functionality)

Changes Made

hermes_cli/config.py — add get_max_tokens_from_config(model, base_url, config=None) -> Optional[int]:
- Reads top-level model.max_tokens first; logs a warning and falls through if non-int / non-positive.
- Falls back to custom_providers.<>.models.<>.max_tokens (URL match is trailing-slash insensitive, value must be a positive int).
- Returns None when neither location holds a valid value.
run_agent.py — AIAgent.__init__: after the existing context_length resolution, call the helper and apply the value to self.max_tokens if the constructor didn't pass one.
tests/hermes_cli/test_max_tokens_config.py — 15 unit tests covering top-level / per-model / precedence / edge cases / providers-dict (v12+) form.

How to Test

Configure a custom provider in ~/.hermes/config.yaml:

model:
  default: qwen3.6-27b-fp8
  provider: custom
  base_url: http://192.168.x.x:8080/v1
  max_tokens: 32768

Send a prompt likely to produce a long response (e.g. "summarise this 50K-word document").
Before this fix: the response gets cut off at the server's default cap (often 2K–4K) with finish_reason='length'.
After this fix: the agent sends max_tokens=32768, and the response runs to a natural stop.

Or run the unit tests:

pytest tests/hermes_cli/test_max_tokens_config.py -v

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (feat(config): ...)
I searched for existing PRs — feat(config): support per-model max_tokens in custom_providers config #15037 proposes this schema but no implementation has landed
My PR contains only changes related to this feature
I've run pytest tests/ -q and the changes don't introduce new failures (vs. main baseline)
I've added tests for the change (15 cases)
I've tested on my platform: macOS 15 (Apple Silicon), Qwen 3.6 27B FP8 served via vLLM

Documentation & Housekeeping

No breaking change to the schema — model.max_tokens is opt-in
Existing context_length semantics unchanged

Custom OpenAI-compatible providers (vLLM, llama.cpp, ollama, ...) that do not advertise a max_tokens default via /models cause responses to truncate with `finish_reason='length'` because the agent never sends a max_tokens hint. The chat_completions transport leaves max_tokens unset, the server uses its own (often conservative) default, and long answers get cut mid-sentence. Mirror the existing `model.context_length` override pattern: - Top-level `model.max_tokens` (preferred) - Legacy `custom_providers.<>.models.<>.max_tokens` (per-model) Read in AIAgent.__init__ right after the context_length resolution block; apply only when the constructor did not pass an explicit max_tokens. No behaviour change for built-in providers (Anthropic / OpenAI / OpenRouter) — they still resolve max_tokens via their own adapter logic. Aligns with the schema proposed in upstream issue NousResearch#15037.

leon7609 · 2026-05-19T04:51:49Z

Closing as superseded by upstream.

v0.14.0 (v2026.5.16) ships native model.max_tokens support in run_agent.py via commit a78e622 ("fix(agent): honor configured model max tokens"). It reads model.max_tokens from config.yaml with bool / non-positive / int-parse validation and an invalid-value warning — functionally equivalent to this PR's intent (which was aligned with #15037). Carrying a parallel get_max_tokens_from_config helper would only duplicate the behavior and conflict on rebase.

Verified locally on a 0.13→0.14 upgrade: dropped this patch, upstream's native path covers the same config-driven override. Thanks for the consideration!

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder area/config Config system, migrations, profiles labels May 1, 2026

leon7609 mentioned this pull request May 3, 2026

feat(gateway): support [[as_document]] directive for skill media routing #19067

Closed

alt-glitch mentioned this pull request May 18, 2026

fix(agent): honor custom provider max_tokens #28154

Open

leon7609 closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): support model.max_tokens override in config.yaml#18445

feat(config): support model.max_tokens override in config.yaml#18445
leon7609 wants to merge 1 commit into
NousResearch:mainfrom
leon7609:feat/config-max-tokens-override

leon7609 commented May 1, 2026

Uh oh!

leon7609 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leon7609 commented May 1, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

leon7609 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants