fix(gateway): honor custom_providers max_tokens when constructing AIAgent#20121
Closed
konsisumer wants to merge 1 commit into
Closed
fix(gateway): honor custom_providers max_tokens when constructing AIAgent#20121konsisumer wants to merge 1 commit into
konsisumer wants to merge 1 commit into
Conversation
…gent The per-provider max_tokens cap set in custom_providers (or the new-style providers dict) was dropped during config normalization and never reached AIAgent. The gateway therefore fell back to the provider transport default even when the user had explicitly raised the cap. Whitelist max_tokens in the normalizer, propagate it through runtime provider resolution, and forward it via the gateway runtime dict with a fallback to model.max_tokens so a global cap is still honoured. Fixes NousResearch#20004.
Collaborator
Contributor
Author
|
Closing — deferring to #20149 by @Sanjays2402 which addresses the same. Reopen if that PR stalls. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Per-provider
max_tokensset undercustom_providers(or the new-styleprovidersdict) was dropped during config normalization and never reachedAIAgent, so the gateway always used provider transport defaults regardless of the user's cap.What changed and why
hermes_cli/config.py: addmax_tokensto_KNOWN_KEYSin_normalize_custom_provider_entryand preserve positive int values in the normalized entry — without this, the key was dropped (and a spurious "unknown config keys" warning was logged).hermes_cli/runtime_provider.py: propagatemax_tokensfrom_get_named_custom_provider(legacy list, v12 dict, and credential-pool branches) and from_resolve_named_custom_runtimeso the resolved runtime dict carries the cap.gateway/run.py: includemax_tokensin_resolve_runtime_agent_kwargsand the fallback-provider helper (with a fallback to top-levelmodel.max_tokens), and forward it through_resolve_turn_agent_configsoAIAgent(**turn_route["runtime"])receives the value.tests/hermes_cli/test_custom_provider_max_tokens.py: 10 new tests covering normalization (positive int, zero/negative rejection, non-int rejection, no spurious unknown-key warning), runtime propagation through the legacy list and v12 dict paths, omission semantics, and gateway precedence (runtime wins, falls back tomodel.max_tokens, returnsNonewhen neither is set).Precedence is now:
custom_providers[].max_tokens(carried on the runtime dict) →model.max_tokens(global) →None(provider transport default).How to test
pytest tests/hermes_cli/test_custom_provider_max_tokens.py -q(10 passed locally)pytest tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_custom_provider_context_length.py tests/hermes_cli/test_config.py -q(173 passed)pytest tests/hermes_cli/ tests/gateway/ -qshows only pre-existing platform/flaky failures (systemd D-Bus on macOS, whatsapp/discord adapter tests, an SSE-keepalive timing test) that also fail onmain.custom_providers: [{name: ark, base_url: ..., max_tokens: 131072}]and confirm via gateway logs that the agent'smax_tokensis 131072 instead of the provider default.What platforms tested on
Fixes #20004