fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) by daimon-nous[bot] · Pull Request #32012 · NousResearch/hermes-agent

daimon-nous · 2026-05-25T10:43:17Z

Summary

Mid-tool-call partial-stream-stubs now use finish_reason="length" instead of "stop", routing through the existing continuation machinery with targeted chunking guidance. Fixes the unrecoverable retry loop described in #31998.

Root cause: PR #30998 set finish_reason="stop" for safety (avoid auto-retrying side-effectful tools). But "stop" bypasses ALL continuation machinery → turn ends with warning text → user says "continue" → model retries same large tool call → same stale → infinite loop.

Changes

agent/chat_completion_helpers.py: _stub_finish_reason "stop" → "length" for mid-tool-call partials; attach _dropped_tool_names to stub
agent/conversation_loop.py: third continuation prompt branch — when stub has dropped tool names, injects "break into smaller chunks (~8K tokens)" guidance instead of generic "continue where you left off"
tests/run_agent/test_partial_stream_finish_reason.py: updated assertions + new test_dropped_tool_call_uses_chunking_prompt

Safety

tool_calls=None is preserved on the stub → conversation loop enters text-continuation branch (line 1513), NOT tool-execution branch (line 3246). No tool auto-executes.

Validation

	Before	After
Mid-tool-call stub finish_reason	`stop` (exits turn)	`length` (enters continuation)
Continuation prompt	none (turn ended)	"break into smaller chunks"
User says "continue"	same large call → stale → loop	model uses smaller calls
Bounded retries	N/A	3 max, then graceful exit

./scripts/run_tests.sh tests/run_agent/test_partial_stream_finish_reason.py tests/run_agent/test_streaming.py tests/run_agent/test_stream_interrupt_retry.py
# 50 passed

Fixes #31998

…h continuation (#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance.

hclsys · 2026-05-25T10:44:07Z

Traced this against the #30998 safety concern it's reverting, and it threads the needle correctly:

The worry with going back from finish_reason="stop" to "length" would be re-executing a side-effectful tool. But the stub carries tool_calls=None, and the length-continuation path appends a system/user _continue_content message (conversation_loop.py:~1557) rather than reconstructing/replaying the dropped tool call — so nothing auto-executes. The existing _trunc_has_tool_calls guard ('refusing to execute incomplete tool arguments', ~1575) still backstops it. So fix(streaming): treat partial-stream stub as length truncation, not clean stop #30998's 'don't auto-retry side-effectful tools' intent is preserved; this only changes how the turn recovers, not what executes.
It's bounded: the dropped-tool branch shares the same length_continue_retries < 3 counter (~1515), so a model that keeps emitting oversized calls gives up after 3 chunked-retry attempts instead of looping forever — which is exactly the Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998 infinite-loop this fixes.
_dropped_tool_names plumbed through the stub so the continuation prompt can name the offending tools is a nice touch, and the test asserting the guidance contains 'break' covers the new path.

Two minor notes, non-blocking: (1) the ~8K tokens chunk guidance and [:3] tool-name truncation are hardcoded in the prompt string — fine, but if the real per-call limit is derived elsewhere it'd be worth referencing the same constant so they don't drift. (2) The prompt is fairly prescriptive ('use multiple patch calls or write smaller files') — good for the common case, just slightly coupling the generic streaming layer to specific tool names. Neither blocks; the core fix is correct and safe. LGTM.

github-actions · 2026-05-25T10:44:29Z

🔎 Lint report: `fix/stale-stream-retry-loop` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9319 on HEAD, 9319 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4930 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic

…h continuation (NousResearch#31998) (NousResearch#32012) * fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance. * refactor: extract constants and continuation prompt helper - Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic --------- Co-authored-by: alt-glitch <balyan.sid@gmail.com>

…h continuation (NousResearch#31998) (NousResearch#32012) * fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance. * refactor: extract constants and continuation prompt helper - Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic --------- Co-authored-by: alt-glitch <balyan.sid@gmail.com> #AI commit#

…h continuation (NousResearch#31998) (NousResearch#32012) * fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998) When a stream stalls mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery used finish_reason='stop' which caused the conversation loop to treat the turn as complete, returning only the warning text. When users said 'continue', the model retried the same large tool call, hit the same stale timeout, and looped indefinitely. Changes: - chat_completion_helpers.py: change _stub_finish_reason from 'stop' to 'length' for mid-tool-call partials. The stub still has tool_calls=None so no tool auto-executes — the model gets a fresh API call through the existing length-continuation machinery (bounded to 3 retries). Also attach _dropped_tool_names to the stub for downstream use. - conversation_loop.py: add a third continuation prompt branch for partial-stream-stubs with dropped tool calls. Instead of the generic 'continue where you left off' (which would retry the same large call), tell the model to break the output into smaller tool calls (~8K tokens each) to avoid stream timeouts. - test_partial_stream_finish_reason.py: update existing test from finish_reason='stop' to 'length', add _dropped_tool_names assertion, add new test_dropped_tool_call_uses_chunking_prompt for the 3-way prompt branching. Safety: tool_calls=None is preserved on the stub, so the conversation loop enters the text-continuation branch (line 1513), NOT the tool-call execution branch (line 3246). No tool auto-executes. The model simply gets another API call with targeted guidance. * refactor: extract constants and continuation prompt helper - Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH) - Extract _get_continuation_prompt() in conversation_loop.py — DRYs the 3-way prompt branching and lets tests import the real function - Trim verbose inline comments in chat_completion_helpers.py - Tests import constants + helper instead of duplicating logic --------- Co-authored-by: alt-glitch <balyan.sid@gmail.com>

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 25, 2026

daimon-nous Bot marked this pull request as ready for review May 25, 2026 11:26

alt-glitch merged commit ac5359a into main May 25, 2026
21 of 22 checks passed

alt-glitch deleted the fix/stale-stream-retry-loop branch May 25, 2026 12:13

bot-ted mentioned this pull request May 25, 2026

chore: sync with upstream main (2026-05-25) bot-ted/hermes-agent#50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)#32012

fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)#32012
alt-glitch merged 2 commits into
mainfrom
fix/stale-stream-retry-loop

daimon-nous Bot commented May 25, 2026

Uh oh!

hclsys commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daimon-nous Bot commented May 25, 2026

Summary

Changes

Safety

Validation

Uh oh!

hclsys commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: fix/stale-stream-retry-loop vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 25, 2026 •

edited

Loading

🔎 Lint report: `fix/stale-stream-retry-loop` vs `origin/main`