Skip to content

fix(streaming): treat partial-stream stub as length truncation, not clean stop#30998

Merged
teknium1 merged 3 commits into
NousResearch:mainfrom
xxxigm:fix/30963-partial-stream-not-stop
May 24, 2026
Merged

fix(streaming): treat partial-stream stub as length truncation, not clean stop#30998
teknium1 merged 3 commits into
NousResearch:mainfrom
xxxigm:fix/30963-partial-stream-not-stop

Conversation

@xxxigm

@xxxigm xxxigm commented May 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Stops the agent loop from exiting with iteration budget remaining when an upstream connection drops mid-stream. Fixes the misclassification described in #30963 where a partial-stream recovery stub was reported as text_response(finish_reason=stop) and the loop terminated even though the goal-judge verdict came back as continue milliseconds later.

The recovery stub returned by chat_completion_helpers.interruptible_streaming_api_call already carries a distinct id (partial-stream-stub), so two surgical fixes are enough:

  1. Stub finish_reason (agent/chat_completion_helpers.py) — text-only partial → finish_reason="length" so the existing length-continuation path (3 retries, partial parts merged into final_response) fires automatically. Mid-tool-call partials keep finish_reason="stop" on purpose: their user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry, not the agent.

  2. Truthful continuation prompt (agent/conversation_loop.py) — when the response id is partial-stream-stub, swap the "[System: your response was truncated by the output length limit…]" prompt for a network-error variant ("[System: cut off by a network error mid-stream…]"). Real length truncations still see the original wording. Lying to the model leads to no-op responses ("I wasn't truncated, I'm done") that defeat the continuation.

#5544's "no duplicate message" contract is preserved verbatim — the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta.

Related Issue

Fixes #30963

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • agent/chat_completion_helpers.py — split the partial-stream stub into two branches: text-only emits finish_reason="length", mid-tool-call keeps "stop". Logging line updated to match.
  • agent/conversation_loop.py — length-handler vprint and continuation prompt now branch on response.id == "partial-stream-stub" to use the network-error wording.
  • tests/run_agent/test_partial_stream_finish_reason.py6 new tests locking in: stub finish_reason for both branches, prompt-branching truth-table, and an end-to-end run_conversation integration that drives a stub + continuation pair and verifies the loop makes the second API call (not exits) with the network-error prompt reaching the model.

Backwards compatible: the public stub shape is unchanged (still id="partial-stream-stub"), no config keys added, no schema changes. Real finish_reason="length" responses keep the existing prompt and behaviour.

How to Test

# New regression tests (6 tests, ~3s)
./scripts/run_tests.sh tests/run_agent/test_partial_stream_finish_reason.py

# Adjacent suites (streaming infra + run_agent integration), no regressions
./scripts/run_tests.sh tests/run_agent/test_streaming.py \
    tests/run_agent/test_stream_interrupt_retry.py \
    tests/run_agent/test_partial_stream_finish_reason.py \
    tests/run_agent/test_run_agent.py
# expected: all passing (test_run_agent.py has pre-existing sandbox-permission
# failures on darwin 24.6.0 — confirmed unchanged on upstream/main with
# 42 failed, 35 passed, 262 errors before AND after this PR)

End-to-end behaviour after the fix (mirrors the user's #30963 log):

WARNING agent.chat_completion_helpers:
  Partial stream delivered before error; returning length-truncated stub
  with 109 chars of recovered content so the loop can continue from where
  the stream died: peer closed connection without sending complete
  message body (incomplete chunked read)

⚠️  Stream interrupted by network error
   (finish_reason='length' on partial-stream-stub)
↻ Stream interrupted — requesting continuation (1/3)...

[next API call sent with the network-error continuation prompt;
 model resumes from the cut point, finish_reason=stop, loop continues
 to tool execution / goal eval — no more premature exit]

Before: loop exited with text_response(finish_reason=stop) and 228/9999 budget unused, goal verdict continue arrived too late to do anything. After: loop persists the partial assistant content, asks the model to continue, and keeps making progress against the goal — bounded by the existing length_continue_retries=3 cap so a persistently dead provider still surfaces as a clean partial after 3 attempts.

Checklist

  • Conventional Commits (fix(streaming):, fix(conversation-loop):, test(streaming):)
  • 3 focused commits, single author
  • 6 new tests pass; adjacent streaming + run_agent suites pass; pre-existing unrelated failures on test_run_agent.py confirmed unchanged on upstream/main
  • Tested on macOS 15.6 (darwin 24.6.0), Python 3.12.5
  • No new config keys, no schema migration, no platform-specific calls
  • #5544 "no duplicate streamed message" contract preserved (stub still reuses delivered content rather than re-emitting)

Infographic

partial-stream-length-continuation

xxxigm added 3 commits May 23, 2026 22:05
… stub

When the API connection drops mid-stream after text deltas have already
been delivered, chat_completion_helpers returned a stub response with
finish_reason=stop. The conversation loop then classified the stub as a
clean text completion (text_response(finish_reason=stop)) and exited
with iteration budget remaining — even when the goal-judge verdict
came back as "continue" milliseconds later (issue NousResearch#30963).

Switch the text-only partial-stream stub to finish_reason=length. The
existing length-continuation path (length_continue_retries up to 3,
"continue exactly where you left off" prompt, partial parts merged
into final_response) then fires automatically: the partial assistant
content is persisted, the model is asked to continue from the cut
point, and the loop keeps making progress against the goal.

The mid-tool-call branch keeps finish_reason=stop on purpose — its
user-facing warning ("Ask me to retry if you want to continue") asks
the user to drive the retry rather than auto-replaying a tool call
with possible side effects.

NousResearch#5544's "no duplicate message" contract is preserved verbatim: the
partial content is reused, never re-emitted as a fresh API call, so
the user never sees two copies of the same delta.

Refs: NousResearch#30963
… stream

The length-continue path's user-facing vprint and continuation prompt
both told the model "your response was truncated by the output length
limit." That's a lie when the stub came from a partial-stream network
error (issue NousResearch#30963) — and a lie the model can detect, leading to "I
wasn't truncated, I'm done" no-op responses that defeat the
continuation entirely.

Detect the partial-stream-stub via response.id and swap in:

- vprint:   "Stream interrupted by network error
             (finish_reason='length' on partial-stream-stub)"
- prompt:   "[System: The previous response was cut off by a network
             error mid-stream. Continue exactly where you left off.
             Do not restart or repeat prior text. Finish the answer
             directly.]"

Real length truncations still see the original "truncated by output
length limit" prompt — the model needs to know which class of failure
it's recovering from. Same length_continue_retries=3 budget,
truncated_response_parts merging, and final-response stitching
infrastructure on both branches.

Refs: NousResearch#30963
… contract

Three test classes lock in the NousResearch#30963 fix:

1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call
   through the two recovery branches and asserts:
     - text-only partial → finish_reason="length" (the new behaviour),
     - mid-tool-call partial → finish_reason="stop" (unchanged on purpose).

2. TestLengthContinuationPromptBranching — pure-Python check on the branch
   that picks the continuation prompt by response.id. Locks the network
   error wording for partial-stream-stub vs. the output-length wording
   for everything else.

3. TestConversationLoopPartialStreamContinuation — feeds a stub +
   continuation pair into run_conversation, verifies the loop makes a
   second API call (instead of exiting with text_response(stop)),
   confirms the network-error continuation prompt actually reaches the
   model on call #2, and that final_response stitches both halves.

Refs: NousResearch#30963
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 23, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing fix for #30963 alongside #30988. This PR is more comprehensive — adds text-only vs mid-tool-call branching and a truthful network-error continuation prompt. #30988 is a simpler one-line fix to the same finish_reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Partial Stream Misclassified as Clean Completion Causes Premature Loop Exit

3 participants