Bug Description
When using a custom provider (e.g., xfyun) with a per-model max_tokens configured under custom_providers[].models.<model>.max_tokens, Hermes ignores this value and always defaults to 4096 output tokens.
Root Cause
In run_agent.py, context_length is correctly read from custom_providers per-model config on startup (lines 1896-1946), but there is no equivalent code for max_tokens.
The constructor sets self.max_tokens = max_tokens (default None at line 1208), and when None, the API call falls back to self.max_tokens or 4096 (line 8295).
Steps to Reproduce
- Configure a custom provider with per-model
max_tokens:
custom_providers:
- name: xfyun
base_url: https://maas-coding-api.cn-huabei-1.xf-yun.com/v2
api_key: ${API_KEY}
api_mode: chat_completions
model: astron-code-latest
models:
astron-code-latest:
context_length: 200000
max_tokens: 32000
reasoning: true
- Start a session with
gateway run --replace
- Ask the agent to generate a long response
- Observe
Response truncated (finish_reason='length') - model hit max output tokens in the gateway log, with output capped at ~4096 tokens
Expected Behavior
Hermes should read max_tokens from custom_providers[].models.<model>.max_tokens (when present and valid) and use it as the output token limit, just as it already does for context_length.
Fix (tested and working)
Insert this block after the existing context_length custom_providers lookup in run_agent.py (after the _ensure_lmstudio_runtime_loaded call):
# Also read max_tokens from custom_providers per-model config
if self.max_tokens is None and _custom_providers:
_target = self.base_url.rstrip("/") if self.base_url else ""
for _cp_entry in _custom_providers:
if not isinstance(_cp_entry, dict):
continue
_cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
if _target and _cp_url == _target:
_cp_models = _cp_entry.get("models", {})
if isinstance(_cp_models, dict):
_cp_model_cfg = _cp_models.get(self.model, {})
if isinstance(_cp_model_cfg, dict):
_cp_mt = _cp_model_cfg.get("max_tokens")
if _cp_mt is not None:
try:
_parsed_mt = int(_cp_mt)
if _parsed_mt > 0:
self.max_tokens = _parsed_mt
except (TypeError, ValueError):
pass
break
Related Issues
Environment
- Hermes version: v0.10.0+ (config_version 23)
- Profile: custom profile with custom_providers
- Provider: OpenAI-compatible custom endpoint
Bug Description
When using a custom provider (e.g., xfyun) with a per-model
max_tokensconfigured undercustom_providers[].models.<model>.max_tokens, Hermes ignores this value and always defaults to 4096 output tokens.Root Cause
In
run_agent.py,context_lengthis correctly read fromcustom_providersper-model config on startup (lines 1896-1946), but there is no equivalent code formax_tokens.The constructor sets
self.max_tokens = max_tokens(defaultNoneat line 1208), and whenNone, the API call falls back toself.max_tokens or 4096(line 8295).Steps to Reproduce
max_tokens:gateway run --replaceResponse truncated (finish_reason='length') - model hit max output tokensin the gateway log, with output capped at ~4096 tokensExpected Behavior
Hermes should read
max_tokensfromcustom_providers[].models.<model>.max_tokens(when present and valid) and use it as the output token limit, just as it already does forcontext_length.Fix (tested and working)
Insert this block after the existing
context_lengthcustom_providers lookup inrun_agent.py(after the_ensure_lmstudio_runtime_loadedcall):Related Issues
Environment