fix(streaming): treat partial-stream stub as length truncation, not clean stop by xxxigm · Pull Request #30998 · NousResearch/hermes-agent

xxxigm · 2026-05-23T15:16:53Z

What does this PR do?

Stops the agent loop from exiting with iteration budget remaining when an upstream connection drops mid-stream. Fixes the misclassification described in #30963 where a partial-stream recovery stub was reported as text_response(finish_reason=stop) and the loop terminated even though the goal-judge verdict came back as continue milliseconds later.

The recovery stub returned by chat_completion_helpers.interruptible_streaming_api_call already carries a distinct id (partial-stream-stub), so two surgical fixes are enough:

Stub finish_reason (agent/chat_completion_helpers.py) — text-only partial → finish_reason="length" so the existing length-continuation path (3 retries, partial parts merged into final_response) fires automatically. Mid-tool-call partials keep finish_reason="stop" on purpose: their user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry, not the agent.
Truthful continuation prompt (agent/conversation_loop.py) — when the response id is partial-stream-stub, swap the "[System: your response was truncated by the output length limit…]" prompt for a network-error variant ("[System: cut off by a network error mid-stream…]"). Real length truncations still see the original wording. Lying to the model leads to no-op responses ("I wasn't truncated, I'm done") that defeat the continuation.

#5544's "no duplicate message" contract is preserved verbatim — the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta.

Related Issue

Fixes #30963

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

agent/chat_completion_helpers.py — split the partial-stream stub into two branches: text-only emits finish_reason="length", mid-tool-call keeps "stop". Logging line updated to match.
agent/conversation_loop.py — length-handler vprint and continuation prompt now branch on response.id == "partial-stream-stub" to use the network-error wording.
tests/run_agent/test_partial_stream_finish_reason.py — 6 new tests locking in: stub finish_reason for both branches, prompt-branching truth-table, and an end-to-end run_conversation integration that drives a stub + continuation pair and verifies the loop makes the second API call (not exits) with the network-error prompt reaching the model.

Backwards compatible: the public stub shape is unchanged (still id="partial-stream-stub"), no config keys added, no schema changes. Real finish_reason="length" responses keep the existing prompt and behaviour.

How to Test

# New regression tests (6 tests, ~3s)
./scripts/run_tests.sh tests/run_agent/test_partial_stream_finish_reason.py

# Adjacent suites (streaming infra + run_agent integration), no regressions
./scripts/run_tests.sh tests/run_agent/test_streaming.py \
    tests/run_agent/test_stream_interrupt_retry.py \
    tests/run_agent/test_partial_stream_finish_reason.py \
    tests/run_agent/test_run_agent.py
# expected: all passing (test_run_agent.py has pre-existing sandbox-permission
# failures on darwin 24.6.0 — confirmed unchanged on upstream/main with
# 42 failed, 35 passed, 262 errors before AND after this PR)

End-to-end behaviour after the fix (mirrors the user's #30963 log):

WARNING agent.chat_completion_helpers:
  Partial stream delivered before error; returning length-truncated stub
  with 109 chars of recovered content so the loop can continue from where
  the stream died: peer closed connection without sending complete
  message body (incomplete chunked read)

⚠️  Stream interrupted by network error
   (finish_reason='length' on partial-stream-stub)
↻ Stream interrupted — requesting continuation (1/3)...

[next API call sent with the network-error continuation prompt;
 model resumes from the cut point, finish_reason=stop, loop continues
 to tool execution / goal eval — no more premature exit]

Before: loop exited with text_response(finish_reason=stop) and 228/9999 budget unused, goal verdict continue arrived too late to do anything. After: loop persists the partial assistant content, asks the model to continue, and keeps making progress against the goal — bounded by the existing length_continue_retries=3 cap so a persistently dead provider still surfaces as a clean partial after 3 attempts.

Checklist

Conventional Commits (fix(streaming):, fix(conversation-loop):, test(streaming):)
3 focused commits, single author
6 new tests pass; adjacent streaming + run_agent suites pass; pre-existing unrelated failures on test_run_agent.py confirmed unchanged on upstream/main
Tested on macOS 15.6 (darwin 24.6.0), Python 3.12.5
No new config keys, no schema migration, no platform-specific calls
#5544 "no duplicate streamed message" contract preserved (stub still reuses delivered content rather than re-emitting)

Infographic

… stub When the API connection drops mid-stream after text deltas have already been delivered, chat_completion_helpers returned a stub response with finish_reason=stop. The conversation loop then classified the stub as a clean text completion (text_response(finish_reason=stop)) and exited with iteration budget remaining — even when the goal-judge verdict came back as "continue" milliseconds later (issue NousResearch#30963). Switch the text-only partial-stream stub to finish_reason=length. The existing length-continuation path (length_continue_retries up to 3, "continue exactly where you left off" prompt, partial parts merged into final_response) then fires automatically: the partial assistant content is persisted, the model is asked to continue from the cut point, and the loop keeps making progress against the goal. The mid-tool-call branch keeps finish_reason=stop on purpose — its user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry rather than auto-replaying a tool call with possible side effects. NousResearch#5544's "no duplicate message" contract is preserved verbatim: the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta. Refs: NousResearch#30963

… stream The length-continue path's user-facing vprint and continuation prompt both told the model "your response was truncated by the output length limit." That's a lie when the stub came from a partial-stream network error (issue NousResearch#30963) — and a lie the model can detect, leading to "I wasn't truncated, I'm done" no-op responses that defeat the continuation entirely. Detect the partial-stream-stub via response.id and swap in: - vprint: "Stream interrupted by network error (finish_reason='length' on partial-stream-stub)" - prompt: "[System: The previous response was cut off by a network error mid-stream. Continue exactly where you left off. Do not restart or repeat prior text. Finish the answer directly.]" Real length truncations still see the original "truncated by output length limit" prompt — the model needs to know which class of failure it's recovering from. Same length_continue_retries=3 budget, truncated_response_parts merging, and final-response stitching infrastructure on both branches. Refs: NousResearch#30963

… contract Three test classes lock in the NousResearch#30963 fix: 1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call through the two recovery branches and asserts: - text-only partial → finish_reason="length" (the new behaviour), - mid-tool-call partial → finish_reason="stop" (unchanged on purpose). 2. TestLengthContinuationPromptBranching — pure-Python check on the branch that picks the continuation prompt by response.id. Locks the network error wording for partial-stream-stub vs. the output-length wording for everything else. 3. TestConversationLoopPartialStreamContinuation — feeds a stub + continuation pair into run_conversation, verifies the loop makes a second API call (instead of exiting with text_response(stop)), confirms the network-error continuation prompt actually reaches the model on call #2, and that final_response stitches both halves. Refs: NousResearch#30963

alt-glitch · 2026-05-23T15:29:26Z

Competing fix for #30963 alongside #30988. This PR is more comprehensive — adds text-only vs mid-tool-call branching and a truthful network-error continuation prompt. #30988 is a simpler one-line fix to the same finish_reason.

xxxigm added 3 commits May 23, 2026 22:05

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 23, 2026

ilonagaja509-glitch mentioned this pull request May 23, 2026

fix(streaming): use finish_reason=length for partial stream stub #30988

Closed

teknium1 merged commit 6cafcf9 into NousResearch:main May 24, 2026
21 checks passed

teknium1 mentioned this pull request May 24, 2026

[Bug]: Partial Stream Misclassified as Clean Completion Causes Premature Loop Exit #30963

Closed

1 task

alt-glitch mentioned this pull request May 25, 2026

Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998

Closed

daimon-nous Bot mentioned this pull request May 25, 2026

fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) #32012

Merged

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): treat partial-stream stub as length truncation, not clean stop#30998

fix(streaming): treat partial-stream stub as length truncation, not clean stop#30998
teknium1 merged 3 commits into
NousResearch:mainfrom
xxxigm:fix/30963-partial-stream-not-stop

xxxigm commented May 23, 2026 •

edited by teknium1

Loading

Uh oh!

alt-glitch commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xxxigm commented May 23, 2026 • edited by teknium1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Infographic

Uh oh!

alt-glitch commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xxxigm commented May 23, 2026 •

edited by teknium1

Loading