fix(cli): prevent duplicate output on stream retry#1704
Conversation
|
I’m not entirely sure why this duplicate stream output happens, but I have run into it a few times. I’ll try to track it down the next time it occurs. In the meantime, it might be worth adding some debug logs first so we can confirm whether this is actually the root cause. |
|
This pull request has had no activity for 60 days and is being marked as stale. It will be closed in another 30 days if no further activity occurs. To keep it open, push a new commit or leave a comment. Maintainers may apply |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
|
Hi @liqiongyu, this PR currently has merge conflicts. Could you please rebase or merge the latest |
3eda310 to
401e080
Compare
|
Updated this PR to address the retry cleanup review feedback. Changes made:
Verification:
Latest head: 401e080 |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Hi @liqiongyu — I tried to reproduce the original duplication scenario end-to-end via tmux against both main and this PR, with a deterministic injection that fires InvalidStreamError(NO_FINISH_REASON) once per process. The PR's regression tests pass, but the actual production-path duplication still reproduces on this branch. Wanted to share the trace before this lands.
How I reproduced
I patched geminiChat.ts to throw an InvalidStreamError once per process, after the inner stream loop completes (mirroring the real processStreamResponse validation path at geminiChat.ts:1111-1124):
// Inserted right after `for await (const chunk of stream) { ... }` in the
// outer retry loop, gated by attempt === 0 and a process-global flag.
if (attempt === 0 && !globalThis.__qwenRetryFired) {
globalThis.__qwenRetryFired = true;
throw new InvalidStreamError(
'forced post-stream retry for visual reproduction',
'NO_FINISH_REASON',
);
}Then ran npm run dev in a 200×50 tmux pane on each branch with the same prompt asking for a 3-paragraph essay separated by blank lines.
Result
main(no fix): duplication clearly visible — two✦markers, two complete essays. Expected.- This PR (with fix): duplication still visible — two
✦markers, two complete essays. Not expected.
Why the regression tests pass but production doesn't
The 4 new tests in useGeminiStream.test.tsx > Retry Handling use mocked stream generators where the Retry event is yielded before any Finished event:
yield Content('abcdef');
yield Retry; // ← retry first
yield Content('abcdef');
yield Finished(STOP);But the real geminiChat.ts retry path looks different. The chunk loop fully drains, each chunk's finishReason causes turn.ts to yield Finished, then processStreamResponse validates and throws, then the outer loop yields RETRY. So the actual production sequence is:
Content → Content → … → Content → Finished(STOP) → Retry → Content → Finished(STOP)
Finished arrives before Retry, not after. None of the regression tests cover this ordering.
Root cause
I added trace logging at the retryDiscardHistoryItemIds.add(id) call sites and inside the Retry handler. Output during attempt 1:
[TRACE] split-flush id=…782 setSize=1 # P1 flushed to <Static>, tracked
[TRACE] split-flush id=…783 setSize=2 # P2 flushed to <Static>, tracked
[TEST PATCH] forcing post-stream NO_FINISH_REASON
[TRACE] Retry fresh-restart: size=0 ids=[] # ← Set was already cleared
The Set is empty by the time the Retry handler runs, so the removeHistoryItemsById && retryDiscardHistoryItemIds.size > 0 guard skips the cleanup. The clear happens at useGeminiStream.ts:1271-1275 (see inline comment).
If I comment out that one .clear() and re-run the same scenario, the trace shows size=2 ids=[…782, …783] at the Retry handler and the duplication mostly disappears.
A second, related gap
After fixing the clear, one paragraph still leaks — the last paragraph of attempt 1. That comes from the new pending-flush block inside handleFinishedEvent:
if (finishReason !== FinishReason.MAX_TOKENS && pendingHistoryItemRef.current) {
addItem(pendingHistoryItemRef.current, userMessageTimestamp); // id discarded
setPendingHistoryItem(null);
}The returned id isn't captured, so this final flushed item can never be removed by a subsequent Retry. Same pattern as the split-flush branches — needs the id added to retryDiscardHistoryItemIds (gated by RETRY_DISCARD_HISTORY_ITEM_TYPES).
Suggested fix
- Drop the
retryDiscardHistoryItemIds.clear()on the non-MAX_TOKENS Finished branch. Let the next attempt's first content/thought event reset the tracking instead, or rely on theRetryhandler itself. - In
handleFinishedEvent, capture the id fromaddItem(pendingHistoryItemRef.current, …)and add it toretryDiscardHistoryItemIdsif the type is inRETRY_DISCARD_HISTORY_ITEM_TYPES. - Add a regression test that mirrors the production ordering —
Content → Finished(STOP) → Retry → Content → Finished(STOP)— and assertsremoveHistoryItemsByIdis called with the IDs flushed during attempt 1, including the one finalized by the firstFinished. The existingshould discard streamed content on Retry to avoid duplicatestest interleaves Content and Retry without an intervening Finished, which is why this gap slipped through.
This also retroactively justifies @tanzhenxin's earlier ask for log-based confirmation of the root cause — the symptom matches but the cause-effect chain has a separate failure mode that the unit tests don't model.
Happy to share the patch + tmux capture files if useful.
— claude-opus-4-7 via Claude Code /qreview
| event.value, | ||
| userMessageTimestamp, | ||
| retryDiscardHistoryItemIds, | ||
| ); |
There was a problem hiding this comment.
[Critical] This clear() wipes attempt 1's tracked IDs before the subsequent Retry event can read them.
In the production path, Finished(STOP) arrives before Retry (because processStreamResponse validation in geminiChat.ts:1111-1124 runs after chunks drain and throws InvalidStreamError, which the outer retry loop then yields as RETRY). When this Retry arrives at line 1297, retryDiscardHistoryItemIds.size === 0 so the removeHistoryItemsById call is skipped — and attempt 1's flushed paragraphs stay in <Static>, producing the duplicate output the PR is supposed to fix.
Reproduced via tmux + a single-shot InvalidStreamError injection in geminiChat.ts (see the main review comment for full repro and trace). Removing this single line restores the fix's intended behavior for the most common Retry path in production.
| ); | |
| waitingForMaxTokensRetry = | |
| event.value.reason === FinishReason.MAX_TOKENS; | |
| if (!waitingForMaxTokensRetry) { | |
| geminiMessageBuffer = ''; | |
| thoughtBuffer = ''; | |
| } |
— claude-opus-4-7 via Claude Code /qreview
Dismissing this approval — end-to-end tmux reproduction shows the fix does not cover the production retry path (Finished(STOP) before Retry). Details in the follow-up review: #1704 (review)
401e080 to
28fe475
Compare
Handle GeminiEventType.Retry by discarding previously streamed partial output. Fixes QwenLM#1591
28fe475 to
9fc36b0
Compare
|
|
||
| if ( | ||
| finishReason !== FinishReason.MAX_TOKENS && | ||
| pendingHistoryItemRef.current |
There was a problem hiding this comment.
[Critical] handleFinishedEvent skips committing/tracking the pending assistant item when finishReason === FinishReason.MAX_TOKENS. The max-token escalation path can emit Finished(MAX_TOKENS) followed by a non-continuation Retry; because that pending item never receives an id in retryDiscardHistoryItemIds, the retry cleanup cannot reliably remove the stale partial output, so truncated output can remain alongside the retry output.
Please either commit and add discardable pending assistant/history items to retryDiscardHistoryItemIds for MAX_TOKENS as well, or explicitly clear/track the pending item in the non-continuation Retry path before the retry starts.
— gpt-5.5 via Qwen Code /review
Fixes #1591.
Core emits GeminiEventType.Retry when a streaming attempt is retried due to invalid/partial output. The interactive UI previously ignored this event, so partial output from the failed attempt remained in the Ink history and the retried attempt re-streamed the same text, causing duplicated messages.
Changes:
Tests: