Skip to content

fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149

Open
Sanjays2402 wants to merge 3 commits into
NousResearch:mainfrom
Sanjays2402:fix/issue-20004
Open

fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149
Sanjays2402 wants to merge 3 commits into
NousResearch:mainfrom
Sanjays2402:fix/issue-20004

Conversation

@Sanjays2402

Copy link
Copy Markdown
Contributor

Closes #20004.

A max_tokens value set on a custom_providers (or providers) entry was silently dropped:

  • _normalize_custom_provider_entry discarded the field as an unknown key.
  • Runtime resolution (_get_named_custom_provider, _resolve_named_custom_runtime) never lifted it onto the runtime dict.
  • The gateway's _resolve_runtime_agent_kwargs only read model.max_tokens.

Result: an explicit per-endpoint output cap was overridden by either model.max_tokens (also ignored if absent) or the transport-layer hardcoded default (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.).

Fix — three layers

  1. Normalization (hermes_cli/config.py): add max_tokens to _KNOWN_KEYS; copy positive int values onto the normalized entry. Drop bogus values (zero/negative/string/bool) silently.
  2. Runtime lift (hermes_cli/runtime_provider.py): new _attach_custom_provider_max_tokens helper centralises the validation. Called from all four lookup paths — providers-dict-by-key, providers-dict-by-display-name, legacy custom_providers list, and pool-backed resolution — so they can't drift.
  3. Gateway resolution (gateway/run.py): documented priority chain in _resolve_runtime_agent_kwargs and _try_resolve_fallback_provider:
    1. runtime['max_tokens'] — from the matched custom-provider entry
    2. model.max_tokens — top-level config.yaml fallback
    3. None → AIAgent / transport picks a provider-appropriate default

A tiny _coerce_max_tokens helper enforces the positive-int contract so a misconfigured max_tokens: 64K falls through cleanly instead of crashing the constructor.

Test

17 new cases in tests/hermes_cli/test_custom_provider_max_tokens.py covering:

  • normalization accept/reject for positive int / zero / negative / string / missing key,
  • the _attach_custom_provider_max_tokens helper across all input shapes (positive, zero, negative, string, None, missing, doesn't-overwrite),
  • end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers-dict-by-display-name lookup paths.

Pre-existing related suites stay green:

  • tests/hermes_cli/test_runtime_provider_resolution.py (109 cases)
  • tests/hermes_cli/test_custom_provider_context_length.py (12 cases)
  • tests/hermes_cli/test_provider_config_validation.py (17 cases)

Notes

  • No call-site signature changes — value flows through turn_route['runtime'] via the existing **runtime splat into AIAgent.
  • Same chain applied in primary and auth-fallback resolution so a fallback kick-in doesn't silently change the output cap.
  • Related to fix: properly pass model.max_tokens config to AIAgent in gateway #19991, but reimplemented from scratch against current main.

A user setting `max_tokens` on a `custom_providers` entry currently has
no effect: the field is stripped by config normalization (logged as an
'unknown key') and the runtime resolution path never carries it through,
so the gateway falls back to either `model.max_tokens` or the provider's
hardcoded default for that endpoint.

Add `max_tokens` to the normalizer's `_KNOWN_KEYS` and copy positive
int values onto the normalized entry, then lift it onto every runtime
dict produced by the named-custom-provider lookup paths (providers-dict
keyed match, providers-dict display-name match, legacy custom_providers
list, and pool-backed resolution).  A small `_attach_custom_provider_max_tokens`
helper keeps the validation rule (positive int only) in one place so the
four sites can't drift.

Refs NousResearch#20004.
The gateway's runtime resolution previously dropped any `max_tokens` set
on a custom_providers / providers entry: it lifted only api_key/base_url/
provider/api_mode/etc onto the AIAgent kwargs.  As a result, an explicit
per-endpoint output cap was silently overridden by either
`model.max_tokens` (also ignored) or the transport-layer hardcoded
default for the provider family (4096 for Anthropic Bedrock, 16384 for
NVIDIA NIM, etc.).

Resolve max_tokens with a documented priority chain:
  1. runtime['max_tokens']  \u2014 from the matched custom_providers entry
  2. model.max_tokens       \u2014 top-level config.yaml fallback
  3. None                   \u2014 transport picks its provider-appropriate default

Apply the same chain in both the primary path (`_resolve_runtime_agent_kwargs`)
and the auth-fallback path (`_try_resolve_fallback_provider`) so a
fallback kick-in doesn't silently change the output cap.  Lift the
resolved value onto `turn_route['runtime']` so the value reaches AIAgent
via the existing `**turn_route['runtime']` constructor splat \u2014 no
call-site signature changes.

A small `_coerce_max_tokens` helper enforces the positive-int contract
already used by config normalization, so a misconfigured "max_tokens: 64K"
string falls through to the next layer instead of crashing the constructor.

Refs NousResearch#20004.
17 cases pinning the fix for NousResearch#20004:

* normalization accepts positive int, drops zero/neg/string/bool;

* the _attach_custom_provider_max_tokens helper is the single

  validation seam (rejects None, never overwrites);

* end-to-end through _get_named_custom_provider for legacy

  custom_providers list, providers-dict-by-key, and providers-

  dict-by-display-name lookup paths.
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists labels May 5, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #20121 — same three-layer fix (normalization + runtime lift + gateway resolution) for max_tokens not propagating from custom_providers. Both close #20004.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

max_tokens config from custom_providers is not passed to AIAgent

2 participants