Skip to content

fix(cli): prevent duplicate output on stream retry#1704

Open
liqiongyu wants to merge 1 commit into
QwenLM:mainfrom
liqiongyu:fix/1591-message-duplication
Open

fix(cli): prevent duplicate output on stream retry#1704
liqiongyu wants to merge 1 commit into
QwenLM:mainfrom
liqiongyu:fix/1591-message-duplication

Conversation

@liqiongyu

Copy link
Copy Markdown
Contributor

Fixes #1591.

Core emits GeminiEventType.Retry when a streaming attempt is retried due to invalid/partial output. The interactive UI previously ignored this event, so partial output from the failed attempt remained in the Ink history and the retried attempt re-streamed the same text, causing duplicated messages.

Changes:

  • Add removeItemsById to the history manager.
  • Track streamed history item ids for the current attempt and remove them on Retry.
  • Clear pending buffers/thought state and refresh the static UI.
  • Add regression tests for Retry handling.

Tests:

  • cd packages/cli && npx vitest run src/ui/hooks/useGeminiStream.test.tsx src/ui/hooks/useHistoryManager.test.ts src/ui/App.test.tsx

@tanzhenxin

Copy link
Copy Markdown
Collaborator

I’m not entirely sure why this duplicate stream output happens, but I have run into it a few times. I’ll try to track it down the next time it occurs. In the meantime, it might be worth adding some debug logs first so we can confirm whether this is actually the root cause.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has had no activity for 60 days and is being marked as stale. It will be closed in another 30 days if no further activity occurs. To keep it open, push a new commit or leave a comment. Maintainers may apply pinned, status/blocked, status/on-hold, or status/ready-for-merge to exempt it from auto-close.

wenshao
wenshao previously approved these changes Apr 24, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review

@wenshao

wenshao commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Hi @liqiongyu, this PR currently has merge conflicts. Could you please rebase or merge the latest main and resolve them when you get a chance? Thanks!

@liqiongyu liqiongyu force-pushed the fix/1591-message-duplication branch 3 times, most recently from 3eda310 to 401e080 Compare April 24, 2026 16:20
@liqiongyu

Copy link
Copy Markdown
Contributor Author

Updated this PR to address the retry cleanup review feedback.

Changes made:

  • Track citation and MAX_TOKENS info items as part of the discarded retry attempt, so stale citation metadata is removed with the partial response.
  • Scope retry discard IDs to the current stream attempt instead of shared hook state, so recursive continuations do not delete already-completed assistant output.
  • Added regression coverage for citation retries, concurrent /btw handling, and attempt-scoped retry cleanup.

Verification:

  • npm run build && npm run typecheck passed locally on Node 20.19.5.
  • GitHub CI is passing across Lint, CodeQL, and the macOS/Ubuntu/Windows test matrix on Node 20/22/24.

Latest head: 401e080

@github-actions github-actions Bot removed the status/stale No activity for extended period label Apr 25, 2026
wenshao
wenshao previously approved these changes Apr 25, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liqiongyu — I tried to reproduce the original duplication scenario end-to-end via tmux against both main and this PR, with a deterministic injection that fires InvalidStreamError(NO_FINISH_REASON) once per process. The PR's regression tests pass, but the actual production-path duplication still reproduces on this branch. Wanted to share the trace before this lands.

How I reproduced

I patched geminiChat.ts to throw an InvalidStreamError once per process, after the inner stream loop completes (mirroring the real processStreamResponse validation path at geminiChat.ts:1111-1124):

// Inserted right after `for await (const chunk of stream) { ... }` in the
// outer retry loop, gated by attempt === 0 and a process-global flag.
if (attempt === 0 && !globalThis.__qwenRetryFired) {
  globalThis.__qwenRetryFired = true;
  throw new InvalidStreamError(
    'forced post-stream retry for visual reproduction',
    'NO_FINISH_REASON',
  );
}

Then ran npm run dev in a 200×50 tmux pane on each branch with the same prompt asking for a 3-paragraph essay separated by blank lines.

Result

  • main (no fix): duplication clearly visible — two markers, two complete essays. Expected.
  • This PR (with fix): duplication still visible — two markers, two complete essays. Not expected.

Why the regression tests pass but production doesn't

The 4 new tests in useGeminiStream.test.tsx > Retry Handling use mocked stream generators where the Retry event is yielded before any Finished event:

yield Content('abcdef');
yield Retry;            // ← retry first
yield Content('abcdef');
yield Finished(STOP);

But the real geminiChat.ts retry path looks different. The chunk loop fully drains, each chunk's finishReason causes turn.ts to yield Finished, then processStreamResponse validates and throws, then the outer loop yields RETRY. So the actual production sequence is:

Content → Content → … → Content → Finished(STOP) → Retry → Content → Finished(STOP)

Finished arrives before Retry, not after. None of the regression tests cover this ordering.

Root cause

I added trace logging at the retryDiscardHistoryItemIds.add(id) call sites and inside the Retry handler. Output during attempt 1:

[TRACE] split-flush id=…782 setSize=1            # P1 flushed to <Static>, tracked
[TRACE] split-flush id=…783 setSize=2            # P2 flushed to <Static>, tracked
[TEST PATCH] forcing post-stream NO_FINISH_REASON
[TRACE] Retry fresh-restart: size=0 ids=[]       # ← Set was already cleared

The Set is empty by the time the Retry handler runs, so the removeHistoryItemsById && retryDiscardHistoryItemIds.size > 0 guard skips the cleanup. The clear happens at useGeminiStream.ts:1271-1275 (see inline comment).

If I comment out that one .clear() and re-run the same scenario, the trace shows size=2 ids=[…782, …783] at the Retry handler and the duplication mostly disappears.

A second, related gap

After fixing the clear, one paragraph still leaks — the last paragraph of attempt 1. That comes from the new pending-flush block inside handleFinishedEvent:

if (finishReason !== FinishReason.MAX_TOKENS && pendingHistoryItemRef.current) {
  addItem(pendingHistoryItemRef.current, userMessageTimestamp);   // id discarded
  setPendingHistoryItem(null);
}

The returned id isn't captured, so this final flushed item can never be removed by a subsequent Retry. Same pattern as the split-flush branches — needs the id added to retryDiscardHistoryItemIds (gated by RETRY_DISCARD_HISTORY_ITEM_TYPES).

Suggested fix

  1. Drop the retryDiscardHistoryItemIds.clear() on the non-MAX_TOKENS Finished branch. Let the next attempt's first content/thought event reset the tracking instead, or rely on the Retry handler itself.
  2. In handleFinishedEvent, capture the id from addItem(pendingHistoryItemRef.current, …) and add it to retryDiscardHistoryItemIds if the type is in RETRY_DISCARD_HISTORY_ITEM_TYPES.
  3. Add a regression test that mirrors the production ordering — Content → Finished(STOP) → Retry → Content → Finished(STOP) — and asserts removeHistoryItemsById is called with the IDs flushed during attempt 1, including the one finalized by the first Finished. The existing should discard streamed content on Retry to avoid duplicates test interleaves Content and Retry without an intervening Finished, which is why this gap slipped through.

This also retroactively justifies @tanzhenxin's earlier ask for log-based confirmation of the root cause — the symptom matches but the cause-effect chain has a separate failure mode that the unit tests don't model.

Happy to share the patch + tmux capture files if useful.

— claude-opus-4-7 via Claude Code /qreview

event.value,
userMessageTimestamp,
retryDiscardHistoryItemIds,
);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] This clear() wipes attempt 1's tracked IDs before the subsequent Retry event can read them.

In the production path, Finished(STOP) arrives before Retry (because processStreamResponse validation in geminiChat.ts:1111-1124 runs after chunks drain and throws InvalidStreamError, which the outer retry loop then yields as RETRY). When this Retry arrives at line 1297, retryDiscardHistoryItemIds.size === 0 so the removeHistoryItemsById call is skipped — and attempt 1's flushed paragraphs stay in <Static>, producing the duplicate output the PR is supposed to fix.

Reproduced via tmux + a single-shot InvalidStreamError injection in geminiChat.ts (see the main review comment for full repro and trace). Removing this single line restores the fix's intended behavior for the most common Retry path in production.

Suggested change
);
waitingForMaxTokensRetry =
event.value.reason === FinishReason.MAX_TOKENS;
if (!waitingForMaxTokensRetry) {
geminiMessageBuffer = '';
thoughtBuffer = '';
}

— claude-opus-4-7 via Claude Code /qreview

@wenshao wenshao dismissed their stale review April 25, 2026 04:25

Dismissing this approval — end-to-end tmux reproduction shows the fix does not cover the production retry path (Finished(STOP) before Retry). Details in the follow-up review: #1704 (review)

@liqiongyu liqiongyu force-pushed the fix/1591-message-duplication branch from 401e080 to 28fe475 Compare April 25, 2026 06:18
Handle GeminiEventType.Retry by discarding previously streamed partial output.

Fixes QwenLM#1591
@liqiongyu liqiongyu force-pushed the fix/1591-message-duplication branch from 28fe475 to 9fc36b0 Compare April 25, 2026 06:25

if (
finishReason !== FinishReason.MAX_TOKENS &&
pendingHistoryItemRef.current

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] handleFinishedEvent skips committing/tracking the pending assistant item when finishReason === FinishReason.MAX_TOKENS. The max-token escalation path can emit Finished(MAX_TOKENS) followed by a non-continuation Retry; because that pending item never receives an id in retryDiscardHistoryItemIds, the retry cleanup cannot reliably remove the stale partial output, so truncated output can remain alongside the retry output.

Please either commit and add discardable pending assistant/history items to retryDiscardHistoryItemIds for MAX_TOKENS as well, or explicitly clear/track the pending item in the non-continuation Retry path before the retry starts.

— gpt-5.5 via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Message duplication bug

3 participants