fix(agent_init): read max_tokens from custom_providers per-model config by luyao618 · Pull Request #28142 · NousResearch/hermes-agent

luyao618 · 2026-05-18T18:32:45Z

Summary

Fixes #28046 — max_tokens configured under custom_providers[].models.<model>.max_tokens was silently ignored. Output token requests fell back to the hard-coded 4096 default in chat_completion_helpers.py:269 / conversation_loop.py:2913, capping responses even when the user configured a higher per-model limit.

Root cause

agent/agent_init.py already had a per-model context_length lookup against custom_providers, but no equivalent for max_tokens. So self.max_tokens stayed None from the constructor default and the self.max_tokens or 4096 fallback kicked in at request time.

Fix

Added a parallel max_tokens lookup right after the existing context_length block in agent/agent_init.py:

Only runs when agent.max_tokens is None (don't override an explicit constructor / CLI value).
Matches on base_url then model against the custom_providers list — same matching as the context_length branch above.
Coerces via int(); positive values win, non-positive / non-numeric values log a warning via _ra().logger.warning and leave max_tokens unchanged (so the 4096 default still kicks in).

Tests

tests/run_agent/test_custom_provider_max_tokens.py — 6 cases:

valid integer max_tokens is applied
string-numeric ("16000") parses and is applied
non-numeric ("32K") is rejected with a warning, stays None
zero is rejected with a warning, stays None
missing max_tokens leaves it None
explicit constructor max_tokens is not overridden by the custom_providers lookup

All 6 pass; broader tests/run_agent/ smoke run shows no regression.

Repro (from issue)

custom_providers:
  - name: xfyun
    base_url: https://maas-coding-api.cn-huabei-1.xf-yun.com/v2
    api_key: ${API_KEY}
    api_mode: chat_completions
    model: astron-code-latest
    models:
      astron-code-latest:
        context_length: 200000
        max_tokens: 32000
        reasoning: true

Before: responses cap at ~4096 tokens with finish_reason='length'.
After: configured max_tokens=32000 is honored.

Risk

Narrow — additive block, gated on max_tokens is None and _custom_providers. Mirrors a well-trodden code path right above it. Default behavior (no custom_providers.models.<m>.max_tokens set) is unchanged.

Mirror the existing context_length lookup: when a user configures custom_providers[].models.<model>.max_tokens, honor it instead of falling back to the 4096 default in chat_completion_helpers / conversation_loop. Fixes NousResearch#28046.

BoardJames-Bot · 2026-05-18T18:47:52Z

Board James CI triage for the current head (2a8107c1):

build-arm64 has completed successfully now (8m), so the original arm64 pending item was just normal queue/runtime, not a branch-local Docker regression.
All other non-full-suite checks are green: lint/ty, attribution, common ancestor, e2e, Nix macOS/Ubuntu, supply-chain, build-amd64.
Tests / test is still in progress/pending; GitHub does not expose logs until the job completes. I let it poll for several minutes and it stayed pending rather than failing. Recent main Tests workflow runs around this window are being cancelled repeatedly by repo churn/concurrency, so if this one later cancels, it matches the shared CI drift pattern rather than this PR’s two-file change.
Local focused validation on the PR worktree passes: /Users/spencer/.hermes/hermes-agent/venv/bin/python -m pytest tests/run_agent/test_custom_provider_max_tokens.py -q → 6 passed in 1.78s.

Owner/maintainer action: no branch change requested from this triage. Let Tests / test finish; if it gets cancelled by concurrency, rerun that check once the queue is quieter.

digitalbase · 2026-05-21T07:13:22Z

Bumped onto exact same issue and wanted to contribute.

Switching between gpt-5.5 and opus-4.7 all the time and in a lot of times i get the "response truncated due to output length limit" mid-reply.

Now although this fix is great for custom providers, it doesn't solve the issue with openrouter or bedrock where you can switch between models from the same provider. I was thinking we need more of a shape like

model:
  provider: bedrock
  default: anthropic.claude-opus-4-7
  max_tokens: 16384                        # global fallback (unchanged behavior)
  per_model_max_tokens:                # new
    anthropic.claude-opus-4-7: 24576
    gpt-5.5: 65536
    anthropic/claude-sonnet-4.6: 32768

I'd love to hear some thoughts. Will try if i can file a PR.

Update: Maybe #24495 is what i was thinking about

luyao618 · 2026-05-31T06:00:05Z

Closing — open too long, no longer relevant.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder area/config Config system, migrations, profiles labels May 18, 2026

digitalbase mentioned this pull request May 21, 2026

feat(config): add per-model max_tokens overlay #29705

Open

banditburai mentioned this pull request May 30, 2026

feat(custom_providers): per-model max_tokens with switch/fallback re-resolution #35518

Open

luyao618 closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent_init): read max_tokens from custom_providers per-model config#28142

fix(agent_init): read max_tokens from custom_providers per-model config#28142
luyao618 wants to merge 1 commit into
NousResearch:mainfrom
luyao618:fix/custom-provider-max-tokens

luyao618 commented May 18, 2026

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

digitalbase commented May 21, 2026 •

edited

Loading

Uh oh!

luyao618 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

luyao618 commented May 18, 2026

Summary

Root cause

Fix

Tests

Repro (from issue)

Risk

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

digitalbase commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luyao618 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

digitalbase commented May 21, 2026 •

edited

Loading