fix(agent): boost max_tokens on length-continuation retries by ygd58 · Pull Request #9489 · NousResearch/hermes-agent

ygd58 · 2026-04-14T07:02:54Z

Problem

Models with a low default output limit (e.g. GLM-4.7 on NVIDIA Build) truncate at the same limit on every continuation attempt, exhausting all 3 retries without ever finishing.

Root Cause

The continuation loop sends the continue prompt but calls the API with the same max_tokens each time. If the model hits the limit on the first call, it hits it again on every retry.

Fix

Set _ephemeral_max_output_tokens before each continuation retry, growing the budget per attempt (capped at 32K): Retry 1 = max_tokens x 2, Retry 2 = max_tokens x 3.

Also extended _ephemeral_max_output_tokens consumption to the chat_completions path in _build_api_kwargs() so both chat_completions and anthropic_messages modes benefit.

Before / After

Before: GLM-4.7 truncated x3 at same limit, error.
After: GLM-4.7 retried with growing limit, more room to finish.

Models with a low default output limit (e.g. GLM-4.7 on NVIDIA Build) truncate at the same limit on every continuation attempt, exhausting all 3 retries without ever finishing the response. Fix: set _ephemeral_max_output_tokens before each continuation retry, doubling the output budget per attempt (capped at 32K). This applies to both chat_completions and anthropic_messages modes: - chat_completions: _ephemeral_max_output_tokens now consumed in _build_api_kwargs() alongside the existing anthropic path - Retry 1: max_tokens * 2, Retry 2: max_tokens * 3 (max 32K) Fixes NousResearch#9372

alt-glitch · 2026-04-27T06:05:50Z

Related to #12231 (merged) which wired ephemeral max_tokens into chat_completions. Check if the continuation-retry boost in this PR is already covered by that merge.

teknium1 · 2026-06-10T08:49:19Z

Thanks for this contribution @ygd58! The fixes described in this PR are already present on main.

This is an automated hermes-sweeper review.

Evidence:

run_agent.py lines 12810–12812: progressive length-continuation boost (_boost_base * (length_continue_retries + 1), capped at 32 768) setting _ephemeral_max_output_tokens on each retry — exactly matching this PR's approach.
run_agent.py lines 8394–8396: _ephemeral_max_output_tokens consumed in the chat_completions branch of _build_api_kwargs, covering OpenRouter, NVIDIA NIM, and all other non-Anthropic providers.
Implementing commit: f7af90e2d — "fix: wire _ephemeral_max_output_tokens into chat_completions and add NVIDIA NIM default" — whose commit message describes all three fixes verbatim.

As noted by @alt-glitch, PR #12231 (merged) covered the same ground. Closing as implemented on main.

ygd58 mentioned this pull request Apr 14, 2026

[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns model hit max output tokens #9372

Closed

1 task

uwings mentioned this pull request Apr 24, 2026

feat(config): support per-model max_tokens in custom_providers config #15037

Open

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026

daimon-nous Bot mentioned this pull request May 15, 2026

[Bug]: Response truncated due to output length limit — still occurring after #7237 fix (re-opening closed issue) #26425

Closed

1 task

teknium1 closed this Jun 10, 2026

teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): boost max_tokens on length-continuation retries#9489

fix(agent): boost max_tokens on length-continuation retries#9489
ygd58 wants to merge 1 commit into
NousResearch:mainfrom
ygd58:fix/continuation-max-tokens-boost

ygd58 commented Apr 14, 2026

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

teknium1 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ygd58 commented Apr 14, 2026

Problem

Root Cause

Fix

Before / After

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

teknium1 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants