Skip to content

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

@shokollm

Description

@shokollm

Bug Description

Setting max_tokens under the model key in config.yaml does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.

model:
  default: MiniMax-M2.7
  provider: custom
  base_url: https://api.minimax.io/v1
  max_tokens: 8192  # ← this does nothing

Steps to Reproduce

  1. Set model.max_tokens: 8192 in config.yaml
  2. Run hermes chat and engage in a conversation requiring a long response
  3. Observe the response gets truncated with no error message
  4. Check the API request — max_tokens is not included

Expected Behavior

model.max_tokens in config.yaml should be passed to the AIAgent constructor, which then sends it to the API.

Actual Behavior

The AIAgent constructor accepts a max_tokens parameter (run_agent.py#L660), but callers never provide it:

  • cli.py#L2100: self.agent = AIAgent(...) — called without max_tokens
  • gateway/run.py#L781-#L789: _resolve_turn_agent_config() builds a primary dict without max_tokens, so it never flows through to AIAgent

The _build_api_kwargs method (run_agent.py#L4864) only adds max_tokens to the API request when self.max_tokens is not None:

if self.max_tokens is not None:
    api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    # Band-aid: use hardcoded per-model limits
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

Since callers never pass max_tokens, it is always None and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).

Root Cause Analysis (confirmed with live API intercept)

I patched the installed hermes-agent to log what _build_api_kwargs() sends to the API. Two tests were run:

Test 1 — Current behavior (BUG):

[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!

Test 2 — With max_tokens=50 passed to AIAgent:

[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!

MiniMax API respects max_tokens (verified with live API call):

Setting completion_tokens finish_reason
max_tokens=50 50 length (truncated)
No max_tokens (default) 787 stop (complete)

With max_tokens=50, the API returned exactly 50 tokens and set finish_reason=length, confirming the parameter is respected.

Note on OpenRouter + Claude Band-Aid

run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires max_tokens:

elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

This uses a hardcoded lookup table (_get_anthropic_max_output) and ignores the user's model.max_tokens config entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.

Provider Compatibility

This fix is safe for all providers:

  • If max_tokens is NOT in config.yaml → the fix extracts None → behavior is identical to before (no change)
  • If max_tokens IS in config.yaml → user explicitly configured it for their provider/model
  • Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
  • The existing band-aid for OpenRouter + Claude confirms that passing max_tokens to providers that support it is the intended behavior

Proposed Fix

Proof-of-concept on fix/model-max-tokens-config branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config

Changes:

  1. cli.py: Extract max_tokens = CLI_CONFIG["model"].get("max_tokens") and pass to AIAgent
  2. gateway/run.py: Add user_config parameter to _resolve_turn_agent_config(), extract max_tokens from user_config.get("model", {}).get("max_tokens"), include it in the primary dict so it flows through to AIAgent

Affected Component

  • CLI (interactive chat)
  • Gateway (Telegram/Discord/Slack/WhatsApp)
  • Configuration (config.yaml, .env, hermes setup)

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions