Skip to content

Bug: max_tokens not read from custom_providers per-model config, always defaults to 4096 #28046

@dashixiong-droid

Description

@dashixiong-droid

Bug Description

When using a custom provider (e.g., xfyun) with a per-model max_tokens configured under custom_providers[].models.<model>.max_tokens, Hermes ignores this value and always defaults to 4096 output tokens.

Root Cause

In run_agent.py, context_length is correctly read from custom_providers per-model config on startup (lines 1896-1946), but there is no equivalent code for max_tokens.

The constructor sets self.max_tokens = max_tokens (default None at line 1208), and when None, the API call falls back to self.max_tokens or 4096 (line 8295).

Steps to Reproduce

  1. Configure a custom provider with per-model max_tokens:
custom_providers:
  - name: xfyun
    base_url: https://maas-coding-api.cn-huabei-1.xf-yun.com/v2
    api_key: ${API_KEY}
    api_mode: chat_completions
    model: astron-code-latest
    models:
      astron-code-latest:
        context_length: 200000
        max_tokens: 32000
        reasoning: true
  1. Start a session with gateway run --replace
  2. Ask the agent to generate a long response
  3. Observe Response truncated (finish_reason='length') - model hit max output tokens in the gateway log, with output capped at ~4096 tokens

Expected Behavior

Hermes should read max_tokens from custom_providers[].models.<model>.max_tokens (when present and valid) and use it as the output token limit, just as it already does for context_length.

Fix (tested and working)

Insert this block after the existing context_length custom_providers lookup in run_agent.py (after the _ensure_lmstudio_runtime_loaded call):

# Also read max_tokens from custom_providers per-model config
if self.max_tokens is None and _custom_providers:
    _target = self.base_url.rstrip("/") if self.base_url else ""
    for _cp_entry in _custom_providers:
        if not isinstance(_cp_entry, dict):
            continue
        _cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
        if _target and _cp_url == _target:
            _cp_models = _cp_entry.get("models", {})
            if isinstance(_cp_models, dict):
                _cp_model_cfg = _cp_models.get(self.model, {})
                if isinstance(_cp_model_cfg, dict):
                    _cp_mt = _cp_model_cfg.get("max_tokens")
                    if _cp_mt is not None:
                        try:
                            _parsed_mt = int(_cp_mt)
                            if _parsed_mt > 0:
                                self.max_tokens = _parsed_mt
                        except (TypeError, ValueError):
                            pass
            break

Related Issues

Environment

  • Hermes version: v0.10.0+ (config_version 23)
  • Profile: custom profile with custom_providers
  • Provider: OpenAI-compatible custom endpoint

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/configConfig system, migrations, profilescomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions