fix: wire ephemeral max_tokens into chat_completions + NVIDIA NIM default by kshitijk4poor · Pull Request #12231 · NousResearch/hermes-agent

kshitijk4poor · 2026-04-18T17:19:58Z

Summary

Salvage of #12152 by @LVT382009.

What this fixes

_ephemeral_max_output_tokens not consumed by chat_completions — The error-recovery ephemeral override (set when the API returns "max_tokens too large given prompt") was only consumed in the anthropic_messages branch of _build_api_kwargs. All chat_completions providers (OpenRouter, NVIDIA NIM, Qwen, Alibaba, custom, etc.) silently ignored it. Now consumed at highest priority in the cascade, matching the anthropic pattern.
NVIDIA NIM max_tokens default (16384) — NVIDIA NIM falls back to a very low internal default when max_tokens is omitted, causing models like GLM-4.7 to truncate immediately (thinking tokens exhaust the budget before the response starts).
Progressive length-continuation boost — When finish_reason='length' triggers a continuation retry, the output budget now grows progressively (2x base on retry 1, 3x on retry 2, capped at 32768) via _ephemeral_max_output_tokens. Previously the retry loop re-sent the same token limit on all 3 attempts.

Test plan

pytest tests/run_agent/ tests/agent/test_auxiliary_client.py — 851 passed (3 pre-existing failures on main)
E2E: 9 scenarios covering the full priority cascade (NIM default, ephemeral priority, user override, non-NVIDIA omission, OpenRouter ephemeral, boost math, cap)

Closes #12152

github-actions · 2026-04-18T17:20:12Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Container build files modified

Changes to Dockerfiles or compose files can alter base images, add build steps, or expose ports. Verify base image pins and build commands.

Files:

Dockerfile

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@LVT382009

…NVIDIA NIM default Based on #12152 by @LVT382009. Two fixes to run_agent.py: 1. _ephemeral_max_output_tokens consumption in chat_completions path: The error-recovery ephemeral override was only consumed in the anthropic_messages branch of _build_api_kwargs. All chat_completions providers (OpenRouter, NVIDIA NIM, Qwen, Alibaba, custom, etc.) silently ignored it. Now consumed at highest priority, matching the anthropic pattern. 2. NVIDIA NIM max_tokens default (16384): NVIDIA NIM falls back to a very low internal default when max_tokens is omitted, causing models like GLM-4.7 to truncate immediately (thinking tokens exhaust the budget before the response starts). 3. Progressive length-continuation boost: When finish_reason='length' triggers a continuation retry, the output budget now grows progressively (2x base on retry 1, 3x on retry 2, capped at 32768) via _ephemeral_max_output_tokens. Previously the retry loop just re-sent the same token limit on all 3 attempts.

kshitijk4poor mentioned this pull request Apr 18, 2026

meta: NVIDIA NIM provider parity tracker #12233

Closed

teknium1 force-pushed the salvage/nvidia-nim-ephemeral-max-tokens branch from 4954768 to f38de37 Compare April 18, 2026 19:51

teknium1 merged commit f7af90e into main Apr 18, 2026
5 of 7 checks passed

teknium1 deleted the salvage/nvidia-nim-ephemeral-max-tokens branch April 18, 2026 19:51

teknium1 mentioned this pull request Apr 18, 2026

fix: NVIDIA NIM models always truncate due to missing max_tokens default and ephemeral boost not wired to chat_completions #12152

Closed

alt-glitch mentioned this pull request Apr 27, 2026

fix(agent): boost max_tokens on length-continuation retries #9489

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: wire ephemeral max_tokens into chat_completions + NVIDIA NIM default#12231

fix: wire ephemeral max_tokens into chat_completions + NVIDIA NIM default#12231
teknium1 merged 1 commit into
mainfrom
salvage/nvidia-nim-ephemeral-max-tokens

kshitijk4poor commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kshitijk4poor commented Apr 18, 2026

Summary

What this fixes

Test plan

Uh oh!

github-actions Bot commented Apr 18, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Container build files modified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants