fix(agent_init): read max_tokens from custom_providers per-model config#28142
fix(agent_init): read max_tokens from custom_providers per-model config#28142luyao618 wants to merge 1 commit into
Conversation
Mirror the existing context_length lookup: when a user configures custom_providers[].models.<model>.max_tokens, honor it instead of falling back to the 4096 default in chat_completion_helpers / conversation_loop. Fixes NousResearch#28046.
|
Board James CI triage for the current head (
Owner/maintainer action: no branch change requested from this triage. Let |
|
Bumped onto exact same issue and wanted to contribute. Switching between Now although this fix is great for custom providers, it doesn't solve the issue with I'd love to hear some thoughts. Will try if i can file a PR. Update: Maybe #24495 is what i was thinking about |
|
Closing — open too long, no longer relevant. |
Summary
Fixes #28046 —
max_tokensconfigured undercustom_providers[].models.<model>.max_tokenswas silently ignored. Output token requests fell back to the hard-coded 4096 default inchat_completion_helpers.py:269/conversation_loop.py:2913, capping responses even when the user configured a higher per-model limit.Root cause
agent/agent_init.pyalready had a per-modelcontext_lengthlookup againstcustom_providers, but no equivalent formax_tokens. Soself.max_tokensstayedNonefrom the constructor default and theself.max_tokens or 4096fallback kicked in at request time.Fix
Added a parallel
max_tokenslookup right after the existingcontext_lengthblock inagent/agent_init.py:agent.max_tokens is None(don't override an explicit constructor / CLI value).base_urlthenmodelagainst thecustom_providerslist — same matching as thecontext_lengthbranch above.int(); positive values win, non-positive / non-numeric values log a warning via_ra().logger.warningand leavemax_tokensunchanged (so the 4096 default still kicks in).Tests
tests/run_agent/test_custom_provider_max_tokens.py— 6 cases:max_tokensis applied"16000") parses and is applied"32K") is rejected with a warning, staysNoneNonemax_tokensleaves itNonemax_tokensis not overridden by the custom_providers lookupAll 6 pass; broader
tests/run_agent/smoke run shows no regression.Repro (from issue)
Before: responses cap at ~4096 tokens with
finish_reason='length'.After: configured
max_tokens=32000is honored.Risk
Narrow — additive block, gated on
max_tokens is None and _custom_providers. Mirrors a well-trodden code path right above it. Default behavior (nocustom_providers.models.<m>.max_tokensset) is unchanged.