fix(gateway): honor max_tokens from custom_providers / providers entries (#20004) by Sanjays2402 · Pull Request #20149 · NousResearch/hermes-agent

Sanjays2402 · 2026-05-05T10:14:13Z

Closes #20004.

A max_tokens value set on a custom_providers (or providers) entry was silently dropped:

_normalize_custom_provider_entry discarded the field as an unknown key.
Runtime resolution (_get_named_custom_provider, _resolve_named_custom_runtime) never lifted it onto the runtime dict.
The gateway's _resolve_runtime_agent_kwargs only read model.max_tokens.

Result: an explicit per-endpoint output cap was overridden by either model.max_tokens (also ignored if absent) or the transport-layer hardcoded default (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.).

Fix — three layers

Normalization (hermes_cli/config.py): add max_tokens to _KNOWN_KEYS; copy positive int values onto the normalized entry. Drop bogus values (zero/negative/string/bool) silently.
Runtime lift (hermes_cli/runtime_provider.py): new _attach_custom_provider_max_tokens helper centralises the validation. Called from all four lookup paths — providers-dict-by-key, providers-dict-by-display-name, legacy custom_providers list, and pool-backed resolution — so they can't drift.
Gateway resolution (gateway/run.py): documented priority chain in _resolve_runtime_agent_kwargs and _try_resolve_fallback_provider:
1. runtime['max_tokens'] — from the matched custom-provider entry
2. model.max_tokens — top-level config.yaml fallback
3. None → AIAgent / transport picks a provider-appropriate default

A tiny _coerce_max_tokens helper enforces the positive-int contract so a misconfigured max_tokens: 64K falls through cleanly instead of crashing the constructor.

Test

17 new cases in tests/hermes_cli/test_custom_provider_max_tokens.py covering:

normalization accept/reject for positive int / zero / negative / string / missing key,
the _attach_custom_provider_max_tokens helper across all input shapes (positive, zero, negative, string, None, missing, doesn't-overwrite),
end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers-dict-by-display-name lookup paths.

Pre-existing related suites stay green:

tests/hermes_cli/test_runtime_provider_resolution.py (109 cases)
tests/hermes_cli/test_custom_provider_context_length.py (12 cases)
tests/hermes_cli/test_provider_config_validation.py (17 cases)

Notes

No call-site signature changes — value flows through turn_route['runtime'] via the existing **runtime splat into AIAgent.
Same chain applied in primary and auth-fallback resolution so a fallback kick-in doesn't silently change the output cap.
Related to fix: properly pass model.max_tokens config to AIAgent in gateway #19991, but reimplemented from scratch against current main.

A user setting `max_tokens` on a `custom_providers` entry currently has no effect: the field is stripped by config normalization (logged as an 'unknown key') and the runtime resolution path never carries it through, so the gateway falls back to either `model.max_tokens` or the provider's hardcoded default for that endpoint. Add `max_tokens` to the normalizer's `_KNOWN_KEYS` and copy positive int values onto the normalized entry, then lift it onto every runtime dict produced by the named-custom-provider lookup paths (providers-dict keyed match, providers-dict display-name match, legacy custom_providers list, and pool-backed resolution). A small `_attach_custom_provider_max_tokens` helper keeps the validation rule (positive int only) in one place so the four sites can't drift. Refs NousResearch#20004.

The gateway's runtime resolution previously dropped any `max_tokens` set on a custom_providers / providers entry: it lifted only api_key/base_url/ provider/api_mode/etc onto the AIAgent kwargs. As a result, an explicit per-endpoint output cap was silently overridden by either `model.max_tokens` (also ignored) or the transport-layer hardcoded default for the provider family (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.). Resolve max_tokens with a documented priority chain: 1. runtime['max_tokens'] \u2014 from the matched custom_providers entry 2. model.max_tokens \u2014 top-level config.yaml fallback 3. None \u2014 transport picks its provider-appropriate default Apply the same chain in both the primary path (`_resolve_runtime_agent_kwargs`) and the auth-fallback path (`_try_resolve_fallback_provider`) so a fallback kick-in doesn't silently change the output cap. Lift the resolved value onto `turn_route['runtime']` so the value reaches AIAgent via the existing `**turn_route['runtime']` constructor splat \u2014 no call-site signature changes. A small `_coerce_max_tokens` helper enforces the positive-int contract already used by config normalization, so a misconfigured "max_tokens: 64K" string falls through to the next layer instead of crashing the constructor. Refs NousResearch#20004.

17 cases pinning the fix for NousResearch#20004: * normalization accepts positive int, drops zero/neg/string/bool; * the _attach_custom_provider_max_tokens helper is the single validation seam (rejects None, never overwrites); * end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers- dict-by-display-name lookup paths.

alt-glitch · 2026-05-05T10:19:06Z

Duplicate of #20121 — same three-layer fix (normalization + runtime lift + gateway resolution) for max_tokens not propagating from custom_providers. Both close #20004.

Sanjays2402 added 3 commits May 5, 2026 03:06

alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists labels May 5, 2026

konsisumer mentioned this pull request May 6, 2026

fix(gateway): honor custom_providers max_tokens when constructing AIAgent #20121

Closed

alt-glitch mentioned this pull request May 7, 2026

Custom provider max_output_tokens silently dropped by config.py normalizer — defaults to model minimum (2048) #21498

Open

alt-glitch mentioned this pull request May 18, 2026

Bug: max_tokens not read from custom_providers per-model config, always defaults to 4096 #28046

Open

alt-glitch mentioned this pull request May 30, 2026

feat(custom_providers): per-model max_tokens with switch/fallback re-resolution #35518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149

fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149
Sanjays2402 wants to merge 3 commits into
NousResearch:mainfrom
Sanjays2402:fix/issue-20004

Sanjays2402 commented May 5, 2026

Uh oh!

alt-glitch commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sanjays2402 commented May 5, 2026

Fix — three layers

Test

Notes

Uh oh!

alt-glitch commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants