fix(streaming): treat partial-stream stub as length truncation, not clean stop#30998
Merged
teknium1 merged 3 commits intoMay 24, 2026
Merged
Conversation
… stub When the API connection drops mid-stream after text deltas have already been delivered, chat_completion_helpers returned a stub response with finish_reason=stop. The conversation loop then classified the stub as a clean text completion (text_response(finish_reason=stop)) and exited with iteration budget remaining — even when the goal-judge verdict came back as "continue" milliseconds later (issue NousResearch#30963). Switch the text-only partial-stream stub to finish_reason=length. The existing length-continuation path (length_continue_retries up to 3, "continue exactly where you left off" prompt, partial parts merged into final_response) then fires automatically: the partial assistant content is persisted, the model is asked to continue from the cut point, and the loop keeps making progress against the goal. The mid-tool-call branch keeps finish_reason=stop on purpose — its user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry rather than auto-replaying a tool call with possible side effects. NousResearch#5544's "no duplicate message" contract is preserved verbatim: the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta. Refs: NousResearch#30963
… stream The length-continue path's user-facing vprint and continuation prompt both told the model "your response was truncated by the output length limit." That's a lie when the stub came from a partial-stream network error (issue NousResearch#30963) — and a lie the model can detect, leading to "I wasn't truncated, I'm done" no-op responses that defeat the continuation entirely. Detect the partial-stream-stub via response.id and swap in: - vprint: "Stream interrupted by network error (finish_reason='length' on partial-stream-stub)" - prompt: "[System: The previous response was cut off by a network error mid-stream. Continue exactly where you left off. Do not restart or repeat prior text. Finish the answer directly.]" Real length truncations still see the original "truncated by output length limit" prompt — the model needs to know which class of failure it's recovering from. Same length_continue_retries=3 budget, truncated_response_parts merging, and final-response stitching infrastructure on both branches. Refs: NousResearch#30963
… contract Three test classes lock in the NousResearch#30963 fix: 1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call through the two recovery branches and asserts: - text-only partial → finish_reason="length" (the new behaviour), - mid-tool-call partial → finish_reason="stop" (unchanged on purpose). 2. TestLengthContinuationPromptBranching — pure-Python check on the branch that picks the continuation prompt by response.id. Locks the network error wording for partial-stream-stub vs. the output-length wording for everything else. 3. TestConversationLoopPartialStreamContinuation — feeds a stub + continuation pair into run_conversation, verifies the loop makes a second API call (instead of exiting with text_response(stop)), confirms the network-error continuation prompt actually reaches the model on call #2, and that final_response stitches both halves. Refs: NousResearch#30963
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Stops the agent loop from exiting with iteration budget remaining when an upstream connection drops mid-stream. Fixes the misclassification described in #30963 where a partial-stream recovery stub was reported as
text_response(finish_reason=stop)and the loop terminated even though the goal-judge verdict came back ascontinuemilliseconds later.The recovery stub returned by
chat_completion_helpers.interruptible_streaming_api_callalready carries a distinct id (partial-stream-stub), so two surgical fixes are enough:Stub finish_reason (
agent/chat_completion_helpers.py) — text-only partial →finish_reason="length"so the existing length-continuation path (3 retries, partial parts merged intofinal_response) fires automatically. Mid-tool-call partials keepfinish_reason="stop"on purpose: their user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry, not the agent.Truthful continuation prompt (
agent/conversation_loop.py) — when the response id ispartial-stream-stub, swap the "[System: your response was truncated by the output length limit…]" prompt for a network-error variant ("[System: cut off by a network error mid-stream…]"). Real length truncations still see the original wording. Lying to the model leads to no-op responses ("I wasn't truncated, I'm done") that defeat the continuation.#5544's "no duplicate message" contract is preserved verbatim — the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta.Related Issue
Fixes #30963
Type of Change
Changes Made
agent/chat_completion_helpers.py— split the partial-stream stub into two branches: text-only emitsfinish_reason="length", mid-tool-call keeps"stop". Logging line updated to match.agent/conversation_loop.py— length-handler vprint and continuation prompt now branch onresponse.id == "partial-stream-stub"to use the network-error wording.tests/run_agent/test_partial_stream_finish_reason.py— 6 new tests locking in: stub finish_reason for both branches, prompt-branching truth-table, and an end-to-endrun_conversationintegration that drives a stub + continuation pair and verifies the loop makes the second API call (not exits) with the network-error prompt reaching the model.Backwards compatible: the public stub shape is unchanged (still
id="partial-stream-stub"), no config keys added, no schema changes. Realfinish_reason="length"responses keep the existing prompt and behaviour.How to Test
End-to-end behaviour after the fix (mirrors the user's #30963 log):
Before: loop exited with
text_response(finish_reason=stop)and 228/9999 budget unused, goal verdictcontinuearrived too late to do anything. After: loop persists the partial assistant content, asks the model to continue, and keeps making progress against the goal — bounded by the existinglength_continue_retries=3cap so a persistently dead provider still surfaces as a clean partial after 3 attempts.Checklist
fix(streaming):,fix(conversation-loop):,test(streaming):)test_run_agent.pyconfirmed unchanged onupstream/main#5544"no duplicate streamed message" contract preserved (stub still reuses delivered content rather than re-emitting)Infographic