fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149
Open
Sanjays2402 wants to merge 3 commits into
Open
fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)#20149Sanjays2402 wants to merge 3 commits into
Sanjays2402 wants to merge 3 commits into
Conversation
A user setting `max_tokens` on a `custom_providers` entry currently has no effect: the field is stripped by config normalization (logged as an 'unknown key') and the runtime resolution path never carries it through, so the gateway falls back to either `model.max_tokens` or the provider's hardcoded default for that endpoint. Add `max_tokens` to the normalizer's `_KNOWN_KEYS` and copy positive int values onto the normalized entry, then lift it onto every runtime dict produced by the named-custom-provider lookup paths (providers-dict keyed match, providers-dict display-name match, legacy custom_providers list, and pool-backed resolution). A small `_attach_custom_provider_max_tokens` helper keeps the validation rule (positive int only) in one place so the four sites can't drift. Refs NousResearch#20004.
The gateway's runtime resolution previously dropped any `max_tokens` set on a custom_providers / providers entry: it lifted only api_key/base_url/ provider/api_mode/etc onto the AIAgent kwargs. As a result, an explicit per-endpoint output cap was silently overridden by either `model.max_tokens` (also ignored) or the transport-layer hardcoded default for the provider family (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.). Resolve max_tokens with a documented priority chain: 1. runtime['max_tokens'] \u2014 from the matched custom_providers entry 2. model.max_tokens \u2014 top-level config.yaml fallback 3. None \u2014 transport picks its provider-appropriate default Apply the same chain in both the primary path (`_resolve_runtime_agent_kwargs`) and the auth-fallback path (`_try_resolve_fallback_provider`) so a fallback kick-in doesn't silently change the output cap. Lift the resolved value onto `turn_route['runtime']` so the value reaches AIAgent via the existing `**turn_route['runtime']` constructor splat \u2014 no call-site signature changes. A small `_coerce_max_tokens` helper enforces the positive-int contract already used by config normalization, so a misconfigured "max_tokens: 64K" string falls through to the next layer instead of crashing the constructor. Refs NousResearch#20004.
17 cases pinning the fix for NousResearch#20004: * normalization accepts positive int, drops zero/neg/string/bool; * the _attach_custom_provider_max_tokens helper is the single validation seam (rejects None, never overwrites); * end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers- dict-by-display-name lookup paths.
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #20004.
A
max_tokensvalue set on acustom_providers(orproviders) entry was silently dropped:_normalize_custom_provider_entrydiscarded the field as an unknown key._get_named_custom_provider,_resolve_named_custom_runtime) never lifted it onto the runtime dict._resolve_runtime_agent_kwargsonly readmodel.max_tokens.Result: an explicit per-endpoint output cap was overridden by either
model.max_tokens(also ignored if absent) or the transport-layer hardcoded default (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.).Fix — three layers
hermes_cli/config.py): addmax_tokensto_KNOWN_KEYS; copy positive int values onto the normalized entry. Drop bogus values (zero/negative/string/bool) silently.hermes_cli/runtime_provider.py): new_attach_custom_provider_max_tokenshelper centralises the validation. Called from all four lookup paths — providers-dict-by-key, providers-dict-by-display-name, legacycustom_providerslist, and pool-backed resolution — so they can't drift.gateway/run.py): documented priority chain in_resolve_runtime_agent_kwargsand_try_resolve_fallback_provider:runtime['max_tokens']— from the matched custom-provider entrymodel.max_tokens— top-level config.yaml fallbackNone→ AIAgent / transport picks a provider-appropriate defaultA tiny
_coerce_max_tokenshelper enforces the positive-int contract so a misconfiguredmax_tokens: 64Kfalls through cleanly instead of crashing the constructor.Test
17 new cases in
tests/hermes_cli/test_custom_provider_max_tokens.pycovering:_attach_custom_provider_max_tokenshelper across all input shapes (positive, zero, negative, string, None, missing, doesn't-overwrite),_get_named_custom_providerfor legacy custom_providers list, providers-dict-by-key, and providers-dict-by-display-name lookup paths.Pre-existing related suites stay green:
tests/hermes_cli/test_runtime_provider_resolution.py(109 cases)tests/hermes_cli/test_custom_provider_context_length.py(12 cases)tests/hermes_cli/test_provider_config_validation.py(17 cases)Notes
turn_route['runtime']via the existing**runtimesplat into AIAgent.