Skip to content

fix(agent_init): read max_tokens from custom_providers per-model config#28142

Closed
luyao618 wants to merge 1 commit into
NousResearch:mainfrom
luyao618:fix/custom-provider-max-tokens
Closed

fix(agent_init): read max_tokens from custom_providers per-model config#28142
luyao618 wants to merge 1 commit into
NousResearch:mainfrom
luyao618:fix/custom-provider-max-tokens

Conversation

@luyao618

Copy link
Copy Markdown
Contributor

Summary

Fixes #28046max_tokens configured under custom_providers[].models.<model>.max_tokens was silently ignored. Output token requests fell back to the hard-coded 4096 default in chat_completion_helpers.py:269 / conversation_loop.py:2913, capping responses even when the user configured a higher per-model limit.

Root cause

agent/agent_init.py already had a per-model context_length lookup against custom_providers, but no equivalent for max_tokens. So self.max_tokens stayed None from the constructor default and the self.max_tokens or 4096 fallback kicked in at request time.

Fix

Added a parallel max_tokens lookup right after the existing context_length block in agent/agent_init.py:

  • Only runs when agent.max_tokens is None (don't override an explicit constructor / CLI value).
  • Matches on base_url then model against the custom_providers list — same matching as the context_length branch above.
  • Coerces via int(); positive values win, non-positive / non-numeric values log a warning via _ra().logger.warning and leave max_tokens unchanged (so the 4096 default still kicks in).

Tests

tests/run_agent/test_custom_provider_max_tokens.py — 6 cases:

  1. valid integer max_tokens is applied
  2. string-numeric ("16000") parses and is applied
  3. non-numeric ("32K") is rejected with a warning, stays None
  4. zero is rejected with a warning, stays None
  5. missing max_tokens leaves it None
  6. explicit constructor max_tokens is not overridden by the custom_providers lookup

All 6 pass; broader tests/run_agent/ smoke run shows no regression.

Repro (from issue)

custom_providers:
  - name: xfyun
    base_url: https://maas-coding-api.cn-huabei-1.xf-yun.com/v2
    api_key: ${API_KEY}
    api_mode: chat_completions
    model: astron-code-latest
    models:
      astron-code-latest:
        context_length: 200000
        max_tokens: 32000
        reasoning: true

Before: responses cap at ~4096 tokens with finish_reason='length'.
After: configured max_tokens=32000 is honored.

Risk

Narrow — additive block, gated on max_tokens is None and _custom_providers. Mirrors a well-trodden code path right above it. Default behavior (no custom_providers.models.<m>.max_tokens set) is unchanged.

Mirror the existing context_length lookup: when a user configures
custom_providers[].models.<model>.max_tokens, honor it instead of
falling back to the 4096 default in chat_completion_helpers /
conversation_loop.

Fixes NousResearch#28046.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder area/config Config system, migrations, profiles labels May 18, 2026
@BoardJames-Bot

Copy link
Copy Markdown

Board James CI triage for the current head (2a8107c1):

  • build-arm64 has completed successfully now (8m), so the original arm64 pending item was just normal queue/runtime, not a branch-local Docker regression.
  • All other non-full-suite checks are green: lint/ty, attribution, common ancestor, e2e, Nix macOS/Ubuntu, supply-chain, build-amd64.
  • Tests / test is still in progress/pending; GitHub does not expose logs until the job completes. I let it poll for several minutes and it stayed pending rather than failing. Recent main Tests workflow runs around this window are being cancelled repeatedly by repo churn/concurrency, so if this one later cancels, it matches the shared CI drift pattern rather than this PR’s two-file change.
  • Local focused validation on the PR worktree passes: /Users/spencer/.hermes/hermes-agent/venv/bin/python -m pytest tests/run_agent/test_custom_provider_max_tokens.py -q6 passed in 1.78s.

Owner/maintainer action: no branch change requested from this triage. Let Tests / test finish; if it gets cancelled by concurrency, rerun that check once the queue is quieter.

@digitalbase

digitalbase commented May 21, 2026

Copy link
Copy Markdown

Bumped onto exact same issue and wanted to contribute.

Switching between gpt-5.5 and opus-4.7 all the time and in a lot of times i get the "response truncated due to output length limit" mid-reply.

Now although this fix is great for custom providers, it doesn't solve the issue with openrouter or bedrock where you can switch between models from the same provider. I was thinking we need more of a shape like

model:
  provider: bedrock
  default: anthropic.claude-opus-4-7
  max_tokens: 16384                        # global fallback (unchanged behavior)
  per_model_max_tokens:                # new
    anthropic.claude-opus-4-7: 24576
    gpt-5.5: 65536
    anthropic/claude-sonnet-4.6: 32768

I'd love to hear some thoughts. Will try if i can file a PR.

Update: Maybe #24495 is what i was thinking about

@luyao618

Copy link
Copy Markdown
Contributor Author

Closing — open too long, no longer relevant.

@luyao618 luyao618 closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: max_tokens not read from custom_providers per-model config, always defaults to 4096

4 participants