Problem
Currently, model.max_tokens in config.yaml is a global setting. When set, it applies to all providers including the fallback chain. There is no way to specify a per-provider max_tokens override, unlike context_length which already supports per-provider overrides via custom_providers[].models.<model>.context_length.
This is problematic when:
- A custom provider (e.g. Ark DeepSeek) needs an explicit
max_tokens because auto-detection doesn't work
- Fallback providers (e.g. MiniMax, NVIDIA) should NOT inherit that same
max_tokens value
Current workaround
Putting max_tokens in model: makes it global — every provider including fallbacks sends max_tokens=131072 in every API call. The only way to avoid this today is to leave max_tokens unset entirely and accept whatever default each provider chooses.
Proposed solution
Add max_tokens support to custom_providers[].models.<model>.max_tokens, following the exact same pattern as the existing context_length override:
custom_providers:
- name: My Provider
base_url: https://...
api_key: ...
model: my-model
models:
my-model:
context_length: 1000000
max_tokens: 131072 # new field, per-provider
Implementation scope
hermes_cli/config.py — Add get_custom_provider_max_tokens() function parallel to get_custom_provider_context_length().
agent/agent_init.py — After the existing model.max_tokens fallback (around line 1166), add a second fallback that checks custom_providers for a per-provider max_tokens when agent.max_tokens is still None.
hermes_cli/main.py — Optionally update _save_custom_provider to save max_tokens into models.<model>.max_tokens.
Priority
Medium. Not a bug (everything works without it), but a missing feature that causes real confusion — users who set model.max_tokens expecting it to only affect their primary provider may inadvertently pollute their fallback API calls.
Problem
Currently,
model.max_tokensinconfig.yamlis a global setting. When set, it applies to all providers including the fallback chain. There is no way to specify a per-providermax_tokensoverride, unlikecontext_lengthwhich already supports per-provider overrides viacustom_providers[].models.<model>.context_length.This is problematic when:
max_tokensbecause auto-detection doesn't workmax_tokensvalueCurrent workaround
Putting
max_tokensinmodel:makes it global — every provider including fallbacks sendsmax_tokens=131072in every API call. The only way to avoid this today is to leavemax_tokensunset entirely and accept whatever default each provider chooses.Proposed solution
Add
max_tokenssupport tocustom_providers[].models.<model>.max_tokens, following the exact same pattern as the existingcontext_lengthoverride:Implementation scope
hermes_cli/config.py— Addget_custom_provider_max_tokens()function parallel toget_custom_provider_context_length().agent/agent_init.py— After the existingmodel.max_tokensfallback (around line 1166), add a second fallback that checkscustom_providersfor a per-providermax_tokenswhenagent.max_tokensis stillNone.hermes_cli/main.py— Optionally update_save_custom_providerto savemax_tokensintomodels.<model>.max_tokens.Priority
Medium. Not a bug (everything works without it), but a missing feature that causes real confusion — users who set
model.max_tokensexpecting it to only affect their primary provider may inadvertently pollute their fallback API calls.