Problem
When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure max_tokens per-provider or per-model in config.yaml.
The custom_providers section already supports context_length per-model, but max_tokens is not read from config at all. The AIAgent.__init__ only accepts max_tokens as a direct parameter, and neither cli.py nor gateway/run.py pass any config-based max_tokens value through.
This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either:
- Accept the API server's default (which may be far lower than the model's capability)
- Hardcode a global
max_tokens that doesn't work across providers
Technical Details
Resolution chain currently works for context_length:
config.yaml -> model.<provider_name>.models.<model_name>.context_length
run_agent.py reads from custom_providers config in _config_context_length()
But no equivalent exists for max_tokens:
run_agent.py:804 -- self.max_tokens only comes from __init__ parameter
run_agent.py:1366-1431 -- custom_providers loop only reads context_length, not max_tokens
run_agent.py:6644-6645 -- API call only sends max_tokens if self.max_tokens is not None
cli.py:2795-2922 -- no max_tokens passed to AIAgent
gateway/run.py:960 -- same
- Only
batch_runner.py:329 has config.get("max_tokens"), but reads root-level config, not model-level
Proposed Solution
Add max_tokens support to the custom_providers model config, similar to how context_length works:
custom_providers:
deepseek-v4:
base_url: https://api.deepseek.com/v1
models:
deepseek-chat:
context_length: 1000000
max_tokens: 384000
And/or add a top-level default_max_tokens config key for models without explicit config.
Additional Context
Problem
When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure
max_tokensper-provider or per-model inconfig.yaml.The
custom_providerssection already supportscontext_lengthper-model, butmax_tokensis not read from config at all. TheAIAgent.__init__only acceptsmax_tokensas a direct parameter, and neithercli.pynorgateway/run.pypass any config-basedmax_tokensvalue through.This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either:
max_tokensthat doesn't work across providersTechnical Details
Resolution chain currently works for
context_length:config.yaml->model.<provider_name>.models.<model_name>.context_lengthrun_agent.pyreads fromcustom_providersconfig in_config_context_length()But no equivalent exists for
max_tokens:run_agent.py:804--self.max_tokensonly comes from__init__parameterrun_agent.py:1366-1431-- custom_providers loop only readscontext_length, notmax_tokensrun_agent.py:6644-6645-- API call only sendsmax_tokensifself.max_tokens is not Nonecli.py:2795-2922-- no max_tokens passed to AIAgentgateway/run.py:960-- samebatch_runner.py:329hasconfig.get("max_tokens"), but reads root-level config, not model-levelProposed Solution
Add
max_tokenssupport to thecustom_providersmodel config, similar to howcontext_lengthworks:And/or add a top-level
default_max_tokensconfig key for models without explicit config.Additional Context