Skip to content

fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)#32012

Merged
alt-glitch merged 2 commits into
mainfrom
fix/stale-stream-retry-loop
May 25, 2026
Merged

fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)#32012
alt-glitch merged 2 commits into
mainfrom
fix/stale-stream-retry-loop

Conversation

@daimon-nous

@daimon-nous daimon-nous Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Mid-tool-call partial-stream-stubs now use finish_reason="length" instead of "stop", routing through the existing continuation machinery with targeted chunking guidance. Fixes the unrecoverable retry loop described in #31998.

Root cause: PR #30998 set finish_reason="stop" for safety (avoid auto-retrying side-effectful tools). But "stop" bypasses ALL continuation machinery → turn ends with warning text → user says "continue" → model retries same large tool call → same stale → infinite loop.

Changes

  • agent/chat_completion_helpers.py: _stub_finish_reason "stop""length" for mid-tool-call partials; attach _dropped_tool_names to stub
  • agent/conversation_loop.py: third continuation prompt branch — when stub has dropped tool names, injects "break into smaller chunks (~8K tokens)" guidance instead of generic "continue where you left off"
  • tests/run_agent/test_partial_stream_finish_reason.py: updated assertions + new test_dropped_tool_call_uses_chunking_prompt

Safety

tool_calls=None is preserved on the stub → conversation loop enters text-continuation branch (line 1513), NOT tool-execution branch (line 3246). No tool auto-executes.

Validation

Before After
Mid-tool-call stub finish_reason stop (exits turn) length (enters continuation)
Continuation prompt none (turn ended) "break into smaller chunks"
User says "continue" same large call → stale → loop model uses smaller calls
Bounded retries N/A 3 max, then graceful exit
./scripts/run_tests.sh tests/run_agent/test_partial_stream_finish_reason.py tests/run_agent/test_streaming.py tests/run_agent/test_stream_interrupt_retry.py
# 50 passed

Fixes #31998

…h continuation (#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.
@hclsys

hclsys commented May 25, 2026

Copy link
Copy Markdown

Traced this against the #30998 safety concern it's reverting, and it threads the needle correctly:

  • The worry with going back from finish_reason="stop" to "length" would be re-executing a side-effectful tool. But the stub carries tool_calls=None, and the length-continuation path appends a system/user _continue_content message (conversation_loop.py:~1557) rather than reconstructing/replaying the dropped tool call — so nothing auto-executes. The existing _trunc_has_tool_calls guard ('refusing to execute incomplete tool arguments', ~1575) still backstops it. So fix(streaming): treat partial-stream stub as length truncation, not clean stop #30998's 'don't auto-retry side-effectful tools' intent is preserved; this only changes how the turn recovers, not what executes.
  • It's bounded: the dropped-tool branch shares the same length_continue_retries < 3 counter (~1515), so a model that keeps emitting oversized calls gives up after 3 chunked-retry attempts instead of looping forever — which is exactly the Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998 infinite-loop this fixes.
  • _dropped_tool_names plumbed through the stub so the continuation prompt can name the offending tools is a nice touch, and the test asserting the guidance contains 'break' covers the new path.

Two minor notes, non-blocking: (1) the ~8K tokens chunk guidance and [:3] tool-name truncation are hardcoded in the prompt string — fine, but if the real per-call limit is derived elsewhere it'd be worth referencing the same constant so they don't drift. (2) The prompt is fairly prescriptive ('use multiple patch calls or write smaller files') — good for the common case, just slightly coupling the generic streaming layer to specific tool names. Neither blocks; the core fix is correct and safe. LGTM.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/stale-stream-retry-loop vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9319 on HEAD, 9319 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4930 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 25, 2026
@daimon-nous daimon-nous Bot marked this pull request as ready for review May 25, 2026 11:26
- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic
@alt-glitch alt-glitch merged commit ac5359a into main May 25, 2026
21 of 22 checks passed
@alt-glitch alt-glitch deleted the fix/stale-stream-retry-loop branch May 25, 2026 12:13
daletkc pushed a commit to daletkc/hermes-agent that referenced this pull request May 25, 2026
…h continuation (NousResearch#31998) (NousResearch#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…h continuation (NousResearch#31998) (NousResearch#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…h continuation (NousResearch#31998) (NousResearch#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…h continuation (NousResearch#31998) (NousResearch#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…h continuation (NousResearch#31998) (NousResearch#32012)

* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (NousResearch#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls

2 participants