fix(core): deduplicate geminiChat recovery continuation text#3966
Conversation
When a provider hits MAX_TOKENS and the model resumes via the recovery loop, the continuation stream sometimes re-sends characters from the end of the previous response as a context anchor. Without deduplication this causes repeated Markdown tables/prose in the final history even if the live UI suppresses them. Add getRecoveryContinuationSuffix / findContainedRecoveryPrefixReplayLength to strip the replayed prefix before appending the continuation parts. Also include the last 1200 chars of the previous response in the recovery prompt so the model can see where it left off. Two new tests cover: - exact suffix overlap (shared recovery suffix and continuation) - contained tail anchor replay (Markdown table prefix replayed mid-text) Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| function getRecoveryContinuationSuffix( | ||
| previousText: string, | ||
| continuationText: string, | ||
| ): string { |
There was a problem hiding this comment.
[Suggestion] getRecoveryContinuationSuffix 的空输入守卫分支 (previousText.length === 0 || continuationText.length === 0) 未被测试覆盖。若该分支损坏,空字符串场景下的合并行为可能异常。
| ): string { | |
| // 建议添加测试:previousText='' 或 continuationText='',断言返回 continuationText 原样 |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added in 549f27a — prompt-recovery-thought-only exercises the empty-text guard end-to-end (previous turn has only a thought part, so getPlainTextFromParts returns ''). The continuation is asserted to pass through verbatim, and the same test verifies that buildOutputRecoveryMessage does NOT append a <previous_response_suffix> block when there's no plain text to anchor on.
| ): string { | ||
| if (previousText.length === 0 || continuationText.length === 0) { | ||
| return continuationText; | ||
| } |
There was a problem hiding this comment.
[Suggestion] getRecoveryContinuationSuffix 的完全重叠守卫分支 (previousText.endsWith(continuationText) && isSignificantRecoveryOverlap) 未被测试。当续写与上一轮末尾完全相同时应返回 '',但该路径未被直接验证。
| } | |
| // 建议添加测试:previousText='Hello world', continuationText='world'(≥6字节),断言返回 '' |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added in 549f27a — prompt-recovery-full-overlap covers exactly this branch: previous text ends with tail-fragment, continuation is tail-fragment (≥ RECOVERY_OVERLAP_MIN_BYTES), and the test asserts the merged history is just the previous text with no duplication appended (i.e. getRecoveryContinuationSuffix returned '').
| previousModelTurn?.role === 'model' | ||
| ? getPlainTextFromParts(previousModelTurn.parts) | ||
| : ''; | ||
| if (previousText.trim().length === 0) { |
There was a problem hiding this comment.
[Suggestion] buildOutputRecoveryMessage 的空文本分支 (previousText.trim().length === 0) 未被测试。当上一轮模型回复无有效文本时(如仅有 thought part),该分支应返回原始 OUTPUT_RECOVERY_MESSAGE 而不附加 <previous_response_suffix> 块。
| if (previousText.trim().length === 0) { | |
| // 建议添加测试:上一轮 parts 仅为 [{thought: true}] 或空文本,验证恢复消息不包含 <previous_response_suffix> 块 |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added in 549f27a — same prompt-recovery-thought-only test: it captures the recovery user-turn payload from the mock and asserts recoveryMessage contains the Output token limit hit string but does NOT contain <previous_response_suffix>. That covers the previousText.trim().length === 0 branch in buildOutputRecoveryMessage.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
The bug being fixed is real — providers that re-send a context anchor at the start of a max-tokens recovery do leave duplicate Markdown tables in history, and dedup at coalesce time is the right place to do it. The architecture is sound: pure functions, minimal call-site change, defensive bailouts on non-text parts. But the contained-prefix fallback's matching logic is too loose, and it can silently drop legitimate continuation text in plausible cases — which is a worse failure mode than the duplication it's replacing (silent loss is harder for users to notice).
1. The contained-prefix fallback strips legitimate continuation text on common short phrases (severity: high · confidence: very high)
The fallback walks candidate prefix lengths from longest to 1 and returns the first one that (a) is at least 6 bytes (or 4 bytes if it contains a Markdown structural character) AND (b) appears anywhere in the last 4000 characters of the previous response. There's no anchor: no per-line check, no requirement that the match sit near the truncation tail, no Markdown structural anchor. With the floor at 6 bytes, almost any short prose phrase qualifies.
Reproducible false positives running the helper in isolation:
| Previous turn ends with | Continuation begins with | What gets dropped |
|---|---|---|
… In summary, this concludes. |
In summary, the answer is 42. |
In summary, th |
… In conclusion, cats are awesome. |
In conclusion, dogs are also awesome. |
In conclusion, |
… Here is the breakdown: a, b, c. |
Here is the rest of the explanation. |
Here is the |
… I think they are great. I |
I think we should also discuss dogs. |
I think |
| Markdown table in previous turn → continuation re-emits the same header rows + a new data row | The header rows + the leading | of the new data row are stripped — table is mangled |
The greedy "longest match wins" descent makes the loss bigger, not smaller: a 14-byte coincidental match (In conclusion,) wins over a 6-byte one and deletes more. And since the recovery prompt now puts the previous suffix verbatim inside <previous_response_suffix>, the model is actively encouraged to start the continuation with bytes that overlap — exactly the case this helper mishandles.
Tightening the threshold (≥30 bytes for substring matches), anchoring the contained match to a Markdown structural boundary (start of a table row, code fence, heading), or requiring the match to sit at/near the truncation tail rather than anywhere in the 4000-char window would all close this without losing the intended fix for replayed table headers.
2. The suffix-overlap path can over-strip when the model legitimately re-uses a short tail-matching phrase (severity: medium · confidence: high)
Same root cause, less severe. The suffix-overlap loop returns the first overlap that satisfies both previousText.endsWith(overlap) and the significance threshold. Anchoring on endsWith is much stronger than substring-anywhere, so most legitimate continuations are safe. The risk is when the model genuinely starts with bytes that match the truncation tail — e.g. previous ends with Some content.\n and the continuation starts Some content.\nand more! because the model is restating before continuing. The path returns and more! and silently drops the restated opener.
There's also a hard short-circuit when the entire continuation matches the previous tail: a 4-byte structural string like \n# A discards the entire continuation when the previous turn ends with the same bytes. That's the desired behavior for a true duplicate, but offers no escape valve when the continuation genuinely is just a header restart.
Verdict
REQUEST_CHANGES — the contained-prefix fallback in its current form replaces silent duplication with silent loss across a wide class of plausible cases. Tightening the significance threshold and anchoring the match to a structural boundary should close most of the risk in one change. The rest of the PR (architecture, suffix path, recovery-prompt augmentation) is good.
Address review feedback on PR #3966: the contained-prefix fallback in geminiChat recovery dedup was too permissive — a 6-byte minimum plus a 4000-char lookahead window allowed common opener phrases ("In summary,", "In conclusion,", "Here is the …") to silently strip legitimate continuation text whenever they happened to coincide with any substring in the previous turn. Silent loss is a worse failure mode than the duplication we were fixing. Constrain the fallback to its real intended use case — replayed Markdown blocks that providers re-emit at the start of a recovery continuation (table headers, headings, fenced code, lists, blockquotes): - Require the continuation to *open* with a Markdown structural anchor before considering any contained-prefix replay; plain prose openers fall through with no dedup attempted. - Restrict the substring search to the immediate truncation tail (last 400 chars) so a coincidental match far above the truncation point cannot win. - Raise the contained-prefix byte floor (12 bytes) above the suffix- overlap floor. Also add coverage for the previously-untested guard branches (empty input, full-overlap drop, empty previous-text path that skips the <previous_response_suffix> block) and regression tests for the prose-loss scenarios called out in review. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
@tanzhenxin Thanks for the careful review — the prose-loss failure mode you flagged is real and a strict regression on the bug we were trying to fix. Pushed 549f27a addressing concerns #1 and partially #2: #1 (high) — contained-prefix fallback dropping legitimate prose: Tightened on three axes simultaneously, since each one alone leaves a residual class of false positives:
The greedy "longest match wins" descent is preserved because once the structural anchor + tail-lookback + 12-byte floor are all required, the longest match is the right behavior — it's matching an entire replayed block rather than coincidentally matching trailing tokens of a prose phrase. #2 (medium) — suffix-overlap path: Left intact deliberately. The Also addressed @wenshao's three uncovered-branch suggestions inline (regression tests for empty-input, full-overlap drop, and the empty-previous-text path that suppresses |
| * incidental `#` or `|` characters in prose do not count. | ||
| */ | ||
| function startsWithMarkdownStructuralAnchor(text: string): boolean { | ||
| const trimmed = text.replace(/^\n+/, ''); |
There was a problem hiding this comment.
[Suggestion] startsWithMarkdownStructuralAnchor 仅去除 \n,未处理空格/制表符。注释说明「跳过前导空白/换行符」,但 text.replace(/^\n+/, '') 不匹配 或 \t。若提供方在恢复时添加前导空格,结构性锚点检测将失败,导致 contained-prefix 路径完全跳过。
| const trimmed = text.replace(/^\n+/, ''); | |
| const trimmed = text.replace(/^\s+/, ''); |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch — fixed in 7bd70dc. Switched startsWithMarkdownStructuralAnchor to text.replace(/^\s+/, '') so leading spaces and tabs (not just \n) are skipped before the anchor check. The contained-prefix dedup search itself still operates on the unmodified continuation, so this only widens the gate — it doesn't change what gets dedup'd once the gate opens.
|
|
||
| const tail = | ||
| previousText.length > OUTPUT_RECOVERY_TAIL_CHARS | ||
| ? previousText.slice(-OUTPUT_RECOVERY_TAIL_CHARS) |
There was a problem hiding this comment.
[Suggestion] buildOutputRecoveryMessage 截断分支(previousText.slice(-OUTPUT_RECOVERY_TAIL_CHARS),即前一个回复 >1200 字符时)未被测试覆盖。这是一条核心生产路径,但所有新测试的前置响应均较短(最长约 540 字符),slice(-1200) 与多字节字符的交互未被验证。
建议添加测试:使用 'x'.repeat(1300) 作为前置响应,通过捕获 generateContentStream 参数断言 <previous_response_suffix> 块包含被截断的 1200 字符尾部。
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch — added a regression test in 7bd70dc. It uses a 1300-char previous response ('A'.repeat(100) + 'B'.repeat(1200)), captures the recovery user-turn payload, and asserts the <previous_response_suffix> block contains exactly the trailing 1200 chars and that the 100-char head does not leak. This locks down the previousText.slice(-OUTPUT_RECOVERY_TAIL_CHARS) branch.
…l truncation Address wenshao review on PR #3966: - `startsWithMarkdownStructuralAnchor` now strips all leading whitespace (`/^\s+/`) instead of only newlines (`/^\n+/`). Some providers re-emit a recovered Markdown block with leading spaces or tabs, not just newlines; the old regex caused the structural-anchor gate to fail and the contained-prefix dedup path was silently skipped. - Add a regression test for `buildOutputRecoveryMessage` that exercises the `previousText.slice(-OUTPUT_RECOVERY_TAIL_CHARS)` truncation branch with a 1300-char previous response, asserting the <previous_response_suffix> block contains exactly the trailing 1200 chars and that the dropped head does not leak. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
All review feedback addressed: @wenshao's whitespace-in-anchor concern fixed in 7bd70dc, untested >1200-char truncation branch fixed in 7bd70dc, @tanzhenxin's earlier dedup concerns addressed in 549f27a. 66/66 geminiChat tests passing, CI all green. Could you take another look? |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Test coverage gaps: the startsWithMarkdownStructuralAnchor regex has 7 anchor patterns but only 2 (headings, tables) are tested — fenced code, blockquote, unordered/ordered list markers are not exercised. Missing: multi-byte content tests for byte-length thresholds, appendRecoveryContinuationParts with non-text parts, and buildOutputRecoveryMessage with role !== 'model'. Adding these would harden the dedup pipeline against future refactors.
— deepseek-v4-pro via Qwen Code /review
| */ | ||
| function startsWithMarkdownStructuralAnchor(text: string): boolean { | ||
| const trimmed = text.replace(/^\s+/, ''); | ||
| return /^(\|[^\n]*\||#{1,6} |```|>\s|[-*+] |\d+\. )/.test(trimmed); |
There was a problem hiding this comment.
[Suggestion] startsWithMarkdownStructuralAnchor fenced code block regex ``` (three backticks + space) does not match the two most common fenced code opening formats: ```python (language directly after backticks, no space) and ```\n (newline after backticks). This causes startsWithMarkdownStructuralAnchor to return false, which disables the entire contained-prefix dedup path for fenced code blocks — contradicting the stated design intent to handle replayed code blocks.
| return /^(\|[^\n]*\||#{1,6} |```|>\s|[-*+] |\d+\. )/.test(trimmed); | |
| return /^(\|[^\n]*\||#{1,6} |```|>\s|[-*+] |\d+\. )/.test(trimmed); |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
The current regex is actually already what you suggested — there's no trailing space after the three backticks. Looking at the file bytes around line 158:
return /^(\\|[^\\n]*\\||#{1,6} |```|>\\s|[-*+] |\\d+\\. )/.test(trimmed);
The ``` alternative has no space after it, so it correctly matches all three common fenced-code openings:
Verified at the byte level (od -c on lines 156–159) and with a quick Node check. The diff hunk in the review UI may have visually inserted whitespace that doesn't exist in the source. Leaving as-is — happy to revisit if you see a concrete case where it doesn't fire.
| return continuationText; | ||
| } | ||
|
|
||
| function isPlainTextPart(part: Part | undefined): part is Part & { |
There was a problem hiding this comment.
[Suggestion] isPlainTextPart is a near-duplicate of the existing isValidNonThoughtTextPart (line 412), but omits three guards: thoughtSignature, inlineData, and fileData. It also uses stricter equality (part.thought !== true) vs the existing function's truthiness check (!part.thought). Two nearly-identical guard functions with subtly different semantics create a maintenance trap — future bugfixes to one will likely miss the other. Consider wrapping isValidNonThoughtTextPart or adding a comment explaining why the checks intentionally diverge.
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch — fixed in 24582d4. isPlainTextPart now delegates to isValidNonThoughtTextPart, so the recovery-merge path and the consolidated-history path (line 1323) agree on what counts as plain text. The type predicate is preserved at the wrapper level for callers that rely on part.text: string narrowing; the underlying guards (thought, thoughtSignature, function*, inlineData, fileData) now live in one place.
| `${OUTPUT_RECOVERY_MESSAGE}\n\n` + | ||
| 'The previous assistant response ended with this exact suffix. ' + | ||
| 'Do not repeat any line, table row, code line, or prose that already ' + | ||
| 'appears in it; output only text that comes after this suffix:\n\n' + |
There was a problem hiding this comment.
[Suggestion] buildOutputRecoveryMessage embeds raw model output inside <previous_response_suffix>...</previous_response_suffix> pseudo-XML tags without escaping. If the model's own response happens to contain the literal string </previous_response_suffix> (e.g., when generating XML/HTML examples), the recovery prompt structure breaks — the model would see a prematurely closed tag and misinterpret the suffix boundary. Low collision probability, but cheap to fix: escape </ as \<\/ before embedding, or switch to a delimiter with lower collision risk (e.g., a hash-delimited block).
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch — fixed in 24582d4. Added sanitizeRecoverySuffixTail which neutralizes any literal <previous_response_suffix> / </previous_response_suffix> strings inside the tail by inserting a zero-width space (U+200B) between the angle bracket and the rest of the tag. The text remains visually identical to the model and the recovery instruction's intent is preserved, but the literal byte sequence no longer collides with our delimiter scan.
Regression test prompt-recovery-delimiter-collision asserts the recovery prompt has exactly one opening and one closing delimiter even when the previous response contains a raw </previous_response_suffix>. Went with sanitization over a random-nonce delimiter to keep the change minimal and the prompt text deterministic across runs.
Address two review concerns on geminiChat output-recovery: - `isPlainTextPart` was a near-duplicate of `isValidNonThoughtTextPart` with subtly weaker guards (missing thoughtSignature/inlineData/fileData and using `!== true` vs `!part.thought`). Delegate to the shared predicate so the recovery-merge and consolidated-history paths agree on what counts as plain text. - `buildOutputRecoveryMessage` embedded the previous response inside a `<previous_response_suffix>` pseudo-XML block without sanitization. If the model's own truncated output contained the literal closing tag (e.g. while generating XML/HTML examples), the recovery prompt's structure would break. Neutralize literal opening/closing delimiters inside the tail with a zero-width space so the prompt always has exactly one well-formed block; add a regression test that asserts the delimiter pair count stays at 1/1 even when the tail contains a raw `</previous_response_suffix>`. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
| !tail.includes('<previous_response_suffix>') | ||
| ) { | ||
| return tail; | ||
| } |
There was a problem hiding this comment.
[Suggestion] sanitizeRecoverySuffixTail 的开放标签中性化路径(.replace(/<previous_response_suffix>/g, ...))未被测试覆盖。prompt-recovery-delimiter-collision 测试仅验证了闭合标签 </previous_response_suffix> 的中性化。如果开放标签的替换逻辑有 bug,模型输出中的 <previous_response_suffix> 将直接破坏恢复提示的分隔符结构。
| } | |
| // 建议新增测试:previous = 'Tag: <previous_response_suffix> was in the output.' | |
| // 断言 recoveryMessage 中 openCount === 1 且原生开放标签被转换为 <previous_response_suffix> |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch — added a regression test in 9eb7f72. The new prompt-recovery-delimiter-collision-open test mirrors the existing closing-tag test but seeds the previous model turn with a literal <previous_response_suffix> opening tag. It asserts openCount === 1 and closeCount === 1 in the recovery message (so the recovery prompt's structural pair stays unique), checks that the neutralized variant (<previous_response_suffix>, with U+200B) appears inside the embedded tail, and verifies the block still parses with one well-formed match preserving the surrounding prose. Full suite: 68 tests, all green.
The existing prompt-recovery-delimiter-collision test only exercises the closing-tag (`</previous_response_suffix>`) neutralization path. Add a sibling test that emits a literal opening tag in the previous model turn so the opening-tag replace branch is also covered. Asserts exactly one opening/closing delimiter pair in the recovery message and that the neutralized variant (with zero-width space) appears in the embedded tail.
|
All feedback addressed:
68/68 geminiChat tests passing, CI green. Ready for re-review. |
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
Reviewed the deduplication logic added to the output token recovery path in geminiChat.ts. The PR is well-designed with layered defenses (suffix-anchored scan → contained-prefix fallback → prose preservation guard), bounded search windows, and thorough test coverage (9 new tests covering overlap, replay, prose regression, delimiter collisions, and tail truncation).
No critical issues found. Build passes, all 68 tests pass, typecheck and lint are clean.
Key observations (low-confidence suggestions — for human review):
startsWithMarkdownStructuralAnchortable-row regex (\|[^\n]*\|) matches any line with 2+ pipe chars, which could theoretically match prose containing pipe notation. The 12-byte minimum and 400-char tail lookback provide strong guards though.coalesceRecoveryPairsmissingdebugLogger.warnon shape mismatch bail — inconsistent with the file's pattern of logging operational failures.- Leading whitespace in continuation text can bypass suffix-anchored overlap detection for prose continuations (the contained-prefix path handles whitespace but requires Markdown structural anchors).
findContainedRecoveryPrefixReplayLengthincludes()can match Markdown anchors inside code blocks in the previous tail, potentially causing false-positive prefix stripping.- Test gaps: newline edge cases, additional Markdown anchor types (code fences, blockquotes, lists), and content assertion in the existing recovery loop test.
Overall the code is solid and the design is well-thought-out. Good to merge.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Code review notes — see inline comments. Overall LGTM; the conservative anchor-gated dedup design and the 9 new tests (including the two prose-opener / far-substring negative regression tests) are solid. The items below are non-blocking suggestions around readability, one subtle anchor-side check, and constants documentation.
| const OUTPUT_RECOVERY_TAIL_CHARS = 1200; | ||
| const RECOVERY_OVERLAP_MAX_SCAN_CHARS = 4000; | ||
| const RECOVERY_OVERLAP_MIN_BYTES = 6; | ||
| const RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES = 4; |
There was a problem hiding this comment.
nit: these magic numbers deserve a sentence of rationale each, the way RECOVERY_CONTAINED_TAIL_LOOKBACK_CHARS and RECOVERY_CONTAINED_PREFIX_MIN_BYTES already get below. Specifically:
OUTPUT_RECOVERY_TAIL_CHARS = 1200— why 1200? (token budget consideration? empirically enough context?)RECOVERY_OVERLAP_MAX_SCAN_CHARS = 4000— caps the inner loop; worth saying so.RECOVERY_OVERLAP_MIN_BYTES = 6— minimum length before a plain-text overlap is treated as significant (presumably to avoid coincidental short shared suffixes like". "or"the ").RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES = 4— lower floor when the overlap contains Markdown structure chars; explain why structural overlaps get a lower bar.
Future readers will need to adjust these and the reasoning shouldn't have to be reverse-engineered from the tests.
There was a problem hiding this comment.
Good call — done in 3663e8b. Each of the four constants now has a JSDoc block explaining the rationale:
OUTPUT_RECOVERY_TAIL_CHARS = 1200— pragmatic balance: large enough for ~200–400 tokens of trailing prose or a multi-row Markdown table to give the model coherent resume context, small enough to stay well under provider input budgets even combined with the rest of history.RECOVERY_OVERLAP_MAX_SCAN_CHARS = 4000— hard cap on both the suffix-anchored overlap scan and the contained-prefix scan, so recovery dedup stays O(min(prev, cont, 4000)) in iteration count.RECOVERY_OVERLAP_MIN_BYTES = 6— minimum bytes for a plain-text overlap to count as significant; explicitly calls out the". "/"the "/", and "false-positive class your comment named.RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES = 4— lower floor when the overlap contains#/|/backtick/\n, since those don't collide coincidentally the way short prose does.
There was a problem hiding this comment.
Thanks — added a JSDoc block per constant in 3663e8ba. Each rationale captures the angle you flagged:
OUTPUT_RECOVERY_TAIL_CHARS = 1200— large enough for ~200–400 tokens of trailing context (or a multi-row table), small enough to stay under any provider's input budget alongside the rest of history.RECOVERY_OVERLAP_MAX_SCAN_CHARS = 4000— explicit hard cap on the inner loops so dedup staysO(min(previous, continuation, 4000)).RECOVERY_OVERLAP_MIN_BYTES = 6— exactly the". "/"the "false-positive floor you suggested.RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES = 4— explains why structural overlaps get a lower bar ("| a ","## "are almost certainly replayed block markers).
| const prefix = continuationText.slice(0, length); | ||
| if ( | ||
| byteLength(prefix) >= RECOVERY_CONTAINED_PREFIX_MIN_BYTES && | ||
| previousTail.includes(prefix) |
There was a problem hiding this comment.
Subtle: the structural-anchor check at line 184 only enforces that the continuation starts with a Markdown anchor — but previousTail.includes(prefix) here is a plain substring match, so the matched occurrence inside previousTail does not have to begin at a line boundary.
Example: continuation = "### Heading\nfoo bar baz" could match a previousTail that contains "...prose ### Heading\nfoo bar baz" mid-paragraph (perhaps inside a code block discussing Markdown). The 400-char lookback + 12-byte floor make this very unlikely in practice, but if you want full symmetry you could require the match to be at index 0 or preceded by \n in previousTail. Up to you whether the extra strictness is worth it given how narrow the window already is.
There was a problem hiding this comment.
Good catch — tightened in 3663e8b. Added a previousTailContainsAtLineBoundary helper that requires the match inside previousTail to begin at index 0 or immediately after a \n, mirroring the structural-anchor semantics enforced on the continuation side.
A couple of details worth flagging:
- All occurrences are walked (not just the first
indexOf), so a benign mid-paragraph hit doesn't shadow a real line-anchored replay later in the 400-char tail window. Worst case is still bounded by the lookback × scan cap. - This only narrows the gate; it can't widen what gets dedup'd. So the change is strictly safer — no risk of stripping additional continuation text that wasn't already eligible.
Test suite (68 tests) still green.
There was a problem hiding this comment.
Good catch — went ahead and took the extra strictness. 3663e8ba replaces the plain previousTail.includes(prefix) with a new previousTailContainsAtLineBoundary helper that requires the match inside previousTail to begin at index 0 or immediately after \n, mirroring the structural-anchor check on the continuation side.
All occurrences are walked rather than just the first hit, so a benign mid-paragraph match (e.g. inside a fenced code block discussing Markdown) doesn't shadow a real line-anchored replay later in the 400-char tail window. See L246-L262 and the call site at L226.
| for (let length = maxOverlap; length > 0; length -= 1) { | ||
| const overlap = continuationText.slice(0, length); | ||
| if ( | ||
| isSignificantRecoveryOverlap(overlap) && | ||
| previousText.endsWith(overlap) | ||
| ) { | ||
| return continuationText.slice(length); | ||
| } | ||
| } |
There was a problem hiding this comment.
Perf note (non-blocking): this loop is O(n²) — up to RECOVERY_OVERLAP_MAX_SCAN_CHARS (4000) iterations, each doing previousText.endsWith(overlap) (O(m)) plus byteLength(overlap) (also O(m)) inside isSignificantRecoveryOverlap. Worst-case ~16M char ops per recovery event. Fine for the current scale, but worth being aware of if recovery tails ever grow.
If you ever want to optimize: byteLength can be hoisted (the overlap grows monotonically from the previous iteration when scanning down) and the endsWith check could be replaced by precomputing a Z-array / failure function on continuationText and scanning once. Not needed today.
There was a problem hiding this comment.
Agreed it's non-blocking — opted to document rather than refactor in this PR to keep the change focused on review polish. Added an inline comment in 3663e8b above the loop that:
- Names the O(n²) characteristic explicitly (RECOVERY_OVERLAP_MAX_SCAN_CHARS iterations × O(m) per
endsWith+byteLength). - Quantifies the current worst case (~16M char-ops per recovery event at the 4000-char cap) and notes why that's fine today (recovery is rare, cap is small).
- Records the Z-array / failure-function rewrite as the documented next step if the cap ever grows materially.
Happy to do the actual optimization in a follow-up if/when the scan cap becomes a real concern — keeping this PR scoped to the dedup correctness work felt cleaner than mixing in a perf rewrite that isn't needed yet.
There was a problem hiding this comment.
Agreed it's fine at current scale — keeping the simple loop as-is. Did add an inline comment in 3663e8ba making the bound explicit so the next reader doesn't have to derive it:
Worst-case complexity here is O(n²): up to RECOVERY_OVERLAP_MAX_SCAN_CHARS iterations, each calling
previousText.endsWith(overlap)plusbyteLength(overlap)(both O(m)). At the current 4000-char scan cap that is ~16M char-ops per recovery event, which is fine because recovery is rare and the cap is small. If the cap ever grows materially, this can be rewritten with a precomputed Z-array / failure function oncontinuationTextto scan once instead of repeatedly slicing/comparing.
See L285-L291. Recovery only fires after a MAX_TOKENS truncation (rare per session), so the ~16M-op worst case is not on any hot path.
| if (suffix.length > 0) { | ||
| previousLastPart.text += suffix; | ||
| } | ||
| nextParts.shift(); |
There was a problem hiding this comment.
Reads as if nextParts.shift() could be a bug at first glance because it runs even when suffix.length === 0. It's correct — empty suffix means the continuation's first text part was a complete replay of the previous tail and should be dropped — but a one-line comment would save the next reader a double-take, e.g.:
// Always shift the first continuation text part: a non-empty suffix has
// already been appended to previousLastPart, and an empty suffix means
// the part was a pure replay and should be discarded.
nextParts.shift();There was a problem hiding this comment.
Good call — added the explanatory comment in 3663e8b, very close to your suggested wording:
if (suffix.length > 0) {
previousLastPart.text += suffix;
}
// Always drop the first continuation text part: a non-empty suffix has
// already been appended to previousLastPart above, and an empty suffix
// means the part was a pure replay of the previous tail and should be
// discarded so it does not duplicate into history.
nextParts.shift();Saves the double-take exactly where you flagged it.
There was a problem hiding this comment.
Confirmed intentional, and added the clarifying comment you suggested (slightly reworded) in 3663e8ba:
// Always drop the first continuation text part: a non-empty suffix has
// already been appended to previousLastPart above, and an empty suffix
// means the part was a pure replay of the previous tail and should be
// discarded so it does not duplicate into history.
nextParts.shift();See L419-L423. The two cases (non-empty suffix already merged vs. empty suffix = pure replay) both want the part gone, so the unconditional shift() is correct — just non-obvious without the note.
…refix anchor Address PR #3966 review polish items from wenshao: - Add JSDoc rationale to each magic constant (OUTPUT_RECOVERY_TAIL_CHARS, RECOVERY_OVERLAP_MAX_SCAN_CHARS, RECOVERY_OVERLAP_MIN_BYTES, RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES) so future tuning is grounded. - Make the contained-prefix scan symmetric: require the match inside previousTail to begin at index 0 or immediately after a newline, mirroring the structural-anchor check on the continuation side. All occurrences are walked so a benign mid-paragraph hit doesn't shadow a real line-anchored match later in the 400-char tail window. - Document the suffix-anchored overlap loop's O(n^2) bound and the bounded scan cap so the perf characteristic is explicit rather than reverse- engineered. - Explain why appendRecoveryContinuationParts always shifts the first continuation text part even when the dedup suffix is empty (empty suffix means a pure replay that must be discarded). All 68 tests in geminiChat.test.ts still pass; typecheck and lint clean. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
| continuationParts: Part[] | undefined, | ||
| ): Part[] { | ||
| const mergedParts = [...(previousParts ?? [])]; | ||
| const nextParts = [...(continuationParts ?? [])]; |
There was a problem hiding this comment.
[Critical] appendRecoveryContinuationParts 检查 continuationFirstPart(即 nextParts[0])是否为纯文本来决定是否去重。但 processStreamResponse 以 [thoughtPart?, ...consolidatedHistoryParts] 的顺序构建 parts。当思维模型(如 gemini-2.5-pro)从 MAX_TOKENS 恢复时,续写的第一个 part 是 thought part(thought: true),isPlainTextPart 返回 false,整个去重代码块被跳过,重复文本泄漏到历史记录中。
| const nextParts = [...(continuationParts ?? [])]; | |
| const prevTextIdx = findLastIndex(mergedParts, isPlainTextPart); | |
| const contTextIdx = nextParts.findIndex(isPlainTextPart); | |
| if (prevTextIdx >= 0 && contTextIdx >= 0) { | |
| const suffix = getRecoveryContinuationSuffix( | |
| mergedParts[prevTextIdx].text, | |
| nextParts[contTextIdx].text, | |
| ); | |
| if (suffix.length > 0) { | |
| mergedParts[prevTextIdx].text += suffix; | |
| } | |
| nextParts.splice(contTextIdx, 1); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Confirmed real bug — traced the part ordering in processStreamResponse (model turn parts = [thoughtContentPart?, ...consolidatedHistoryParts]), so for thinking models the first continuation part is the recovery turn's thought and the dedup block was being skipped wholesale.
Fixed in 08f2abd by scanning both sides for the plain-text anchor instead of locking onto the boundary indices, and splicing the matched text part instead of shifting the head. Added a regression test (should dedup recovery continuation when the continuation begins with a thought part) that mirrors the exact [thought, text] shape processStreamResponse produces.
| const RECOVERY_CONTAINED_PREFIX_MIN_BYTES = 12; | ||
| // Limit the substring search to the immediate truncation tail so a coincidental | ||
| // match thousands of characters earlier in the previous turn cannot win. | ||
| const RECOVERY_CONTAINED_TAIL_LOOKBACK_CHARS = 400; |
There was a problem hiding this comment.
[Suggestion] 包含前缀去重路径存在多处缺口:(1) RECOVERY_CONTAINED_TAIL_LOOKBACK_CHARS = 400 vs OUTPUT_RECOVERY_TAIL_CHARS = 1200 — 模型在恢复提示中看到 1200 字符的尾部,但 findContainedRecoveryPrefixReplayLength 仅扫描最后 400 字符,导致 -800 到 -401 位置的重放 Markdown 块无法被去重。(2) startsWithMarkdownStructuralAnchor 剥离前导空白进行锚点检查,但前缀匹配循环使用未剥离的文本,导致带前导空白的续写匹配失败。(3) startsWithMarkdownStructuralAnchor 中 blockquote、fenced code、有序/无序列表锚点未被测试覆盖。
建议:(1) 将 RECOVERY_CONTAINED_TAIL_LOOKBACK_CHARS 对齐到 1200,(2) 在 findContainedRecoveryPrefixReplayLength 中使用剥离空白后的文本进行匹配,(3) 补充对应测试用例。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Splitting the three sub-points:
(1) LOOKBACK=400 vs TAIL=1200 — deferring. The asymmetry is intentional, and it's actually load-bearing for the false-positive guard you praised in #discussion_r3230705438 ("the 400-char lookback + 12-byte floor make this very unlikely in practice"). The 1200-char tail is what the recovery prompt sends to the model so it can resume coherently; the 400-char lookback is what we scan for dedup. Widening the dedup window to 1200 would increase the surface for coincidental contained-prefix matches far above the truncation point — exactly the regression that #3 from the earlier review and should not strip prose that coincides with a far-earlier substring of the previous turn test were added to prevent. A replayed Markdown block 800 chars above the truncation tail is rare enough (most providers replay from the immediate tail) that I'd rather leave that duplicate in history than risk eating legitimate prose. Happy to revisit if you have a concrete production trace where the lookback was too narrow.
(2) anchor-trim asymmetry — fixed in 08f2abd. Real bug: startsWithMarkdownStructuralAnchor strips leading whitespace before the regex check, but findContainedRecoveryPrefixReplayLength then matched the un-trimmed continuationText.slice(0, length) against previousTail. A continuation like " ### Heading\nfoo" passed the anchor check but never found "### Heading\nfoo" in the previous tail. Now strips the same leading whitespace before the substring scan, and folds the stripped length back into the returned replayedLength so the caller's slice(replayedLength) invariant still holds. Regression test: should dedup a replayed structural prefix even when the continuation has leading whitespace.
(3) blockquote / fenced code / list anchor tests — deferring. Cheap to add but the structural-anchor regex itself is already covered by the existing table+heading replay test; the remaining branches are pure regex literals with no surrounding logic. Will add in a follow-up if you'd prefer explicit per-anchor coverage.
| /** | ||
| * Lower floor for overlaps that contain Markdown structural characters | ||
| * (`#`, `|`, backtick, newline). Structural anchors are far less likely to | ||
| * collide coincidentally than prose — a 4-byte overlap like `"| a "` or |
There was a problem hiding this comment.
[Suggestion] RECOVERY_OVERLAP_MIN_BYTES = 6 对英文合理(需 6 个 ASCII 字符才可能巧合匹配),但对中文不成立——CJK 字符每个 3 字节,6 字节仅等于 2 个汉字。两个汉字的巧合边界重合非常常见(如「我们」「但是」),可能导致中文对话中合法续写被误删。建议提高到 12 字节,或根据内容语言动态调整。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Confirmed — byteLength uses Buffer.byteLength(text, 'utf8'), so the 6-byte floor genuinely admits a 2-character CJK overlap. Two-character boundary coincidences ("我们" / "但是" / "需要") are extremely common in Chinese prose.
Fixed in 08f2abd by adding a companion code-point floor (RECOVERY_OVERLAP_MIN_CHARS = 4) on top of the byte floor for the prose branch. Effect:
- ASCII: still requires ≥6 chars (gated by the 6-byte floor; 4-byte ASCII never gets through).
- CJK: now requires ≥4 characters (≥12 bytes), so 2-char and 3-char coincidences are rejected.
- Structural anchors are intentionally exempt — the structural branch already governs them and structural collisions are far rarer than prose.
Chose the 4-codepoint floor rather than bumping MIN_BYTES to 12 because the latter would also reject useful 6–11-byte ASCII overlaps. Spreading via [...overlap] to count code points so emoji surrogate pairs don't double-count.
Regression test added (should preserve a coincidental 2-character CJK overlap (byte floor insufficient for CJK)).
| /** | ||
| * Symmetric line-boundary check for the contained-prefix scan: returns true | ||
| * iff `prefix` occurs in `previousTail` starting at index 0 or immediately | ||
| * after a newline. The structural-anchor check on the continuation side only |
There was a problem hiding this comment.
[Suggestion] 整个去重流程(getRecoveryContinuationSuffix、findContainedRecoveryPrefixReplayLength、appendRecoveryContinuationParts、buildOutputRecoveryMessage)中没有任何 debugLogger 调用。如果去重在线上误删或漏删内容,oncall 无法追溯发生了什么。建议在内容被修改的关键点添加 debugLogger.info,至少记录被丢弃/修改的字符数。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Deferring to a follow-up. The observability gap is real, but adding structured logging to the dedup pipeline is a separable concern from the correctness fixes this PR is scoped to, and the right shape (call site, fields, log level, throttling) deserves its own design pass — especially because recovery is rare enough that we want every dedup event logged, not just "info" sampled. Will file a follow-up issue referencing this thread; happy to land it before any release where dedup behavior is user-visible if you'd prefer.
| isSignificantRecoveryOverlap(overlap) && | ||
| previousText.endsWith(overlap) | ||
| ) { | ||
| return continuationText.slice(length); |
There was a problem hiding this comment.
[Suggestion] previousLastPart.text += suffix 原地修改 history 中 part 对象的 text 属性。虽然创建了新数组,但元素是共享引用。若未来有代码缓存 part 引用,将产生难以调试的不一致。建议创建新对象:mergedParts[mergedParts.length - 1] = { ...previousLastPart, text: previousLastPart.text + suffix }
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in 08f2abd. Replaced the in-place previousLastPart.text += suffix with a fresh-allocated part:
mergedParts[previousTextIndex] = {
...previousTextPart,
text: previousTextPart.text + suffix,
};Currently no downstream caller appears to cache part references across the recovery merge, but the mutation invariant was fragile and the allocation is essentially free, so worth fixing defensively. (The same change happens to be necessary for the [Critical] line-404 fix, since the new code splices into nextParts and we no longer hold a stable boundary index.)
| // Delegate to the shared predicate used by normal history consolidation | ||
| // (see `isValidNonThoughtTextPart` below) so the recovery-merge path and | ||
| // the consolidated-history path agree on what counts as "plain text". | ||
| // Keeping the type predicate here gives callers `part.text: string` |
There was a problem hiding this comment.
[Suggestion] buildOutputRecoveryMessage 通过 getPlainTextFromParts 拼接所有文本 part 来构建「精确后缀」,但若模型轮次有多个被非文本 part 分隔的文本 part,拼接结果与实际末尾不匹配。建议至少用 debugLogger 记录差异,或仅使用最后一个非 thought 文本 part 的尾部。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Deferring with reasoning. Traced when this edge case actually fires:
processStreamResponseconsolidates adjacent text parts (lastPart.text += part.text), so the only way a model turn ends up with multiple non-adjacent text parts is if structure interleaves text → non-text → text within a single response (e.g. text → functionCall → text).- The recovery code path only runs when MAX_TOKENS truncated output. Empirically, model turns that hit MAX_TOKENS while emitting interleaved text+tool-calls are uncommon: most truncation events happen in long prose/table generation, where the parts collapse to a single consolidated text run.
- When the interleaved case does occur, the current concatenation produces a slightly misleading
<previous_response_suffix>(joins disjoint text runs without a separator). That weakens the "exact suffix" instruction but doesn't corrupt the dedup logic, becauseappendRecoveryContinuationPartsoperates on the live text part, not ongetPlainTextFromParts's output.
I'd rather not switch to "last non-thought text part only" in this PR because that also misses recovery context in the more common case (single consolidated text part with leading non-thought parts above it). The right fix is probably to render parts as a structured block in the recovery prompt, which is scope creep for this PR. Will file a follow-up issue referencing this thread.
`appendRecoveryContinuationParts` previously only inspected the boundary parts (last of previous, first of continuation). `processStreamResponse` orders parts as `[thoughtPart?, ...consolidatedHistoryParts]`, so for thinking models the first continuation part is the recovery turn's thought — the plain-text predicate failed on it and the entire dedup block was skipped, leaking the replayed overlap into durable history. Now scan both sides for the plain-text anchor and splice the matched text part rather than shifting the head. Allocate a fresh merged part instead of mutating `mergedParts[i].text` in place so callers caching part references never observe a half-merged turn. Two additional hardening fixes on the overlap path: - `isSignificantRecoveryOverlap` adds a 4-code-point floor on top of the 6-byte floor for prose. CJK characters are 3 UTF-8 bytes each, so the byte-only floor admitted 2-character coincidences like "我们" / "但是" that recur constantly across unrelated Chinese turns. The structural-anchor branch is exempted (those collisions are far rarer and the structural floor already governs them). - `findContainedRecoveryPrefixReplayLength` now strips leading whitespace from the continuation before matching. The structural- anchor check already tolerated leading spaces/tabs (some providers re-emit replayed blocks with extra indentation), but the substring scan still used the un-trimmed prefix and silently failed to match the corresponding `previousTail` occurrence. Adds three regression tests covering: a thinking-model recovery continuation whose first part is a thought, a 2-CJK-character coincidence that must NOT be dedup'd, and a leading-whitespace structural replay that must be dedup'd. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] previousTailContainsAtLineBoundary rejection path (match found mid-paragraph, not at line boundary) is untested. Only the accept path is exercised by existing tests. If the line-boundary check regresses to a plain substring match, legitimate continuation text would be silently dropped with no test failure. Suggested fix: add a test where a structural-anchor prefix appears mid-paragraph in previousTail (e.g., "some text### Heading newline content"), asserting the contained-prefix match is rejected and continuation is preserved verbatim.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| previousText: string, | ||
| continuationText: string, | ||
| ): string { | ||
| if (previousText.length === 0 || continuationText.length === 0) { |
There was a problem hiding this comment.
[Suggestion] The continuationText.length === 0 sub-path of the empty-input guard in getRecoveryContinuationSuffix is untested. The previousText.length === 0 path is covered (via prompt-recovery-thought-only), but a scenario where the continuation text part is empty (e.g., thought-only continuation) would exercise the || short-circuit. If this guard regresses, empty continuation text would flow into the endsWith and scan logic.
Consider adding a test where the continuation turn has only thought parts (no text parts), asserting continuationText passes through verbatim.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Looked at this carefully — the continuationText.length === 0 sub-path is defensive-only and unreachable from the production caller, so a "thought-only continuation" test would actually skip the call entirely rather than exercise the guard. Tracing the flow:
appendRecoveryContinuationParts only invokes getRecoveryContinuationSuffix when both findLastPlainTextPartIndex(mergedParts) >= 0 and nextParts.findIndex(isPlainTextPart) >= 0. A thought-only continuation would have continuationTextIndex < 0, the if (… && continuationTextIndex >= 0) block would be skipped, and getRecoveryContinuationSuffix would never run — the empty-input guard inside it is never hit from production. The same applies to previousText.length === 0 (the round-6 reply pointing at prompt-recovery-thought-only was technically imprecise: that test covers buildOutputRecoveryMessage's previousText.trim().length === 0 branch, not the guard inside getRecoveryContinuationSuffix).
Rather than add a synthetic test that exports an internal helper just to hit defensive-only code, I documented the intent in 556b015. The new JSDoc on getRecoveryContinuationSuffix explicitly calls out that the empty-input guard is defensive only — kept in place so a future refactor or direct unit-test caller bypassing the production filter cannot crash, but not reachable from the live recovery pipeline. If you'd still prefer an export + direct unit test for paranoia, happy to add it — but the JSDoc makes the contract explicit either way.
| // duplicated Markdown tables/prose even if the live UI suppresses them. | ||
| const containedPrefixLength = findContainedRecoveryPrefixReplayLength( | ||
| previousText, | ||
| continuationText, |
There was a problem hiding this comment.
[Suggestion] The newline-normalization path in getRecoveryContinuationSuffix (when replayedPrefix ends with newline, previousText does not, and suffix does not start with newline) is untested. Both existing contained-prefix tests use prefixes that do NOT end with a newline, so the newline-prepend block is never entered. If the three-condition normalization breaks, no test catches it.
Consider adding a test: previousText ends without newline (e.g., ### Section), continuation starts with ### Section followed by newline and more text, asserting coalesced text has no duplicate anchor and the newline separator is preserved.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added a regression test in 556b015. The new prompt-recovery-newline-normalization test exercises exactly the three-condition branch: previous is 'Intro paragraph.\n### Section' (no trailing newline), continuation is '### Section\nbody prose continuation' so replayedPrefix ends with \n, previousText does NOT, and suffix does NOT start with \n. The assertion text === '${previous}\nbody prose continuation' would fail (missing separator → '### Sectionbody prose continuation') if the normalization branch regressed.
| continuationParts: Part[] | undefined, | ||
| ): Part[] { | ||
| const mergedParts = [...(previousParts ?? [])]; | ||
| const nextParts = [...(continuationParts ?? [])]; |
There was a problem hiding this comment.
[Suggestion] appendRecoveryContinuationParts has an implicit, undocumented dependency on processStreamResponse's text-part consolidation (lastPart.text += part.text). It assumes the last plain-text part in history contains ALL text output from that turn. If someone refactors processStreamResponse to remove in-place consolidation (e.g., for immutability), the dedup logic would silently break, comparing continuation against only the last unconsolidated text fragment and missing real overlaps. No compile-time or runtime error would flag this.
Consider adding a prominent comment above appendRecoveryContinuationParts documenting this coupling, and an integration test that verifies dedup works even when history contains multiple small adjacent text parts (bypassing consolidation).
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good call — documented in 556b015. Added a JSDoc block above appendRecoveryContinuationParts titled Coupling with processStreamResponse that explicitly states the function assumes its parts arrays came from processStreamResponse's in-place text-part consolidation (lastPart.text += part.text), notes that only the last plain-text part of previousParts and the first plain-text part of continuationParts are inspected, and warns that any refactor emitting multiple adjacent unconsolidated text parts would silently miss real overlaps. The doc-block closes by noting both functions live in this file precisely so the coupling is reviewable in a single window.
Re: the integration test for multiple adjacent unconsolidated text parts — deferring. Such an array shape is unreachable from the live recovery pipeline (every callsite of appendRecoveryContinuationParts is coalesceRecoveryPairs, which feeds parts produced by processStreamResponse), so the test would need to bypass the recovery loop entirely and call appendRecoveryContinuationParts directly. Exporting an internal helper purely to document a "what if processStreamResponse changes" invariant adds API surface for a hypothetical regression; the JSDoc is the load-bearing signal here. Happy to revisit if the consolidation logic ever moves toward immutability.
| const mergedParts = [...(previousParts ?? [])]; | ||
| const nextParts = [...(continuationParts ?? [])]; | ||
|
|
||
| // `processStreamResponse` orders parts as |
There was a problem hiding this comment.
[Suggestion] appendRecoveryContinuationParts return value structure convention is undocumented. coalesceRecoveryPairs reuses the merged result as continuationParts input for subsequent recovery iterations, relying on the invariant that the returned parts array has the same shape as processStreamResponse output ([thoughtPart?, ...textParts, ...nonTextParts]). If the return shape changes, multi-iteration recovery would fail silently.
Consider documenting this convention in the function's JSDoc.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Documented in 556b015. The new JSDoc on appendRecoveryContinuationParts includes a Return-value shape section that explicitly states the returned array preserves processStreamResponse's output convention ([thoughtPart?, ...consolidatedTextParts, ...nonTextParts]) and that coalesceRecoveryPairs relies on this by feeding the merged result back as previousParts on the next recovery iteration. The block also calls out that multi-iteration recovery dedup would fail silently against the wrong part if the shape ever diverged.
Add JSDoc to getRecoveryContinuationSuffix calling out that its empty-input guard is defensive-only (the production caller already filters both sides), and document appendRecoveryContinuationParts' implicit coupling with processStreamResponse's text-part consolidation plus its return-shape convention that coalesceRecoveryPairs relies on for multi-iteration recovery. Add two regression tests: - mid-paragraph match rejection: a structural anchor that appears in the previous tail but is not preceded by a newline must NOT trigger the contained-prefix strip, so legitimate continuation survives verbatim. - newline-normalization branch: when the replayed prefix ends with \n but the previous tail does not and the suffix does not start with \n, the helper must insert a separator so the coalesced text keeps its block boundary. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Re: review-body suggestion on the The new (Posted the inline replies on the 4 line-level threads separately.) |
wenshao
left a comment
There was a problem hiding this comment.
No new issues found in this incremental update. The JSDoc additions for getRecoveryContinuationSuffix and appendRecoveryContinuationParts are well-written and accurately document the defensive guards, coupling contracts, and return-value conventions. The two new test cases (line-boundary-reject and newline-normalization) cleanly cover the remaining untested branches identified in prior reviews.
All 73 tests pass, tsc and eslint are clean. LGTM! ✅
— DeepSeek/deepseek-v4-pro via Qwen Code /review
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
The prior round's prose-loss critical is solidly closed via the three-axis tightening (structural-anchor gate + 12-byte floor + 400-char lookback + line-boundary check) — every false-positive shape from last round now passes through verbatim. @wenshao's 20 inline comments are largely addressed; on the ones that didn't apply, the pushback was correct (the fenced-code regex does match ```python, ```\n, and bare forms). One new correctness item below, plus two low-severity follow-ups (W5's 400-vs-1200 lookback asymmetry is still addressed in docs only; the closing-fence arm of the structural-anchor check is slightly over-broad).
1. Thought ordering is corrupted when a thinking-model continuation is [thought, text] (severity: medium · confidence: high)
The fix scans past the leading thought to find the text part, splices that text out, and appends the remaining continuation parts after the merged previous turn. The leftover thought ends up at the end: durable history becomes [..., previousText + suffix, recoveryThought], with the thought trailing the visible text it conceptually generated. Thinking-model providers (Gemini 2.5+, Anthropic, OpenAI o-series) validate thought-signature provenance and expect thoughts to precede their content. The existing thought-first regression test pins the joined non-thought text but not structural position, so this isn't caught. Fix shape: keep the continuation's pre-text non-text parts with the folded suffix (thought ends up before merged text), or drop them.
Verdict
COMMENT — issue 1 is the only must-address before merge.
| const trimmed = text.replace(/^\s+/, ''); | ||
| return /^(\|[^\n]*\||#{1,6} |```|>\s|[-*+] |\d+\. )/.test(trimmed); | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] startsWithMarkdownStructuralAnchor table-row regex /^\|[^\n]*\|/ matches any line starting with | containing another | — no column-count or whitespace requirement. This contradicts the JSDoc claim that "incidental | characters in prose do not count." Prose like |expression| more text (math/technical writing) passes the anchor gate.
Impact: If continuation starts with a pipe-containing prose line and that same text appears at a line boundary in the last 400 chars of the previous response, the contained-prefix path would silently strip it. Mitigated by the 12-byte minimum but not eliminated.
| // Tighten: require ≥3 pipes (minimum GFM table row) or a separator-row pattern | |
| return /^(\|[^\n]*\|[^\n]*\||\|[\s\-:]+\||#{1,6} |```|>\s|[-*+] |\d+\. )/.test(trimmed); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in 25ea94e.
Tightened the table-row alternation to your suggested form: \|[^\n]*\|[^\n]*\| (≥3 pipes = real GFM row with ≥2 cells) OR \|[\s\-:]+\| (separator row). The JSDoc now explicitly documents why a bare |expression| is rejected.
Added a regression test (prompt-recovery-single-cell-pipe-prose) that constructs the exact scenario you described: continuation starts with |expression| evaluates to a scalar value. and that fragment appears at a line boundary mid-tail of the previous response (so the suffix-anchored scan doesn't catch it, only the contained-prefix path could fire). Confirmed by temporarily reverting the regex — the test fails with a silent strip of |expression| evaluates to a scalar value. , and passes once the regex is tightened.
| return Buffer.byteLength(text, 'utf8'); | ||
| } | ||
|
|
||
| function isSignificantRecoveryOverlap(overlap: string): boolean { |
There was a problem hiding this comment.
[Suggestion] isSignificantRecoveryOverlap character class [#|\\n]matches#, `` ``, |, or `\n` anywhere in the overlap string. This means prose fragments like `"C# programming"` (15 bytes), `"tag #xyz"` (9 bytes), or `"see `backtick`"` (21 bytes) qualify for the lowered 4-byte structural floor instead of the 6-byte prose floor.
Impact: Minimal — the gap is only 2 bytes and coincidence at truncation boundaries is rare. This is primarily a classification-accuracy concern: the implementation is looser than the JSDoc's "structural anchors" implies.
Suggested fix: Scope to Markdown contexts (e.g., /(^|\n)#{1,6}\s/ for headings) or add a comment acknowledging the intentional over-classification and its negligible risk.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Documented in 25ea94e (no behavior change).
Taking your "add a comment acknowledging the intentional over-classification" path rather than scoping to Markdown contexts. Reasoning: the 4-byte structural floor vs the 6-byte prose floor only differs by a 2-byte window. Realistic prose at 4–5 bytes containing #/|/`/\n (e.g. "C#dev", "a|b|c") qualifying structurally requires that exact fragment to be identical at the truncation boundary on both sides — extraordinarily rare in practice, and the prose floor (6 bytes + 4 chars) catches anything longer. Tightening the char-class would add code complexity without changing real-world recovery outcomes.
The new inline comment makes the trade-off explicit so future readers don't have to reverse-engineer the intent.
Tightens `startsWithMarkdownStructuralAnchor`'s table-row alternation so a bare `|expression|` (2 pipes) in technical prose no longer qualifies as a Markdown block anchor — real GFM table rows have ≥3 pipes (≥2 cells) or a separator row like `|---|`. Without this, prose continuation starting with a 2-pipe expression that re-appears at a line boundary mid-tail of the previous response would be silently stripped by the contained-prefix path, contradicting the JSDoc's stated invariant that "incidental `|` characters in prose do not count." Also adds an inline comment to `isSignificantRecoveryOverlap` documenting why the structural-class detection (`[#|`\n]`) is intentionally loose — the 2-byte gap between the 4-byte structural floor and the 6-byte prose floor only matters for 4–5 byte fragments that coincide on both sides of a truncation boundary, which is far rarer than the structural-replay scenarios the lower floor exists to catch. Adds a regression test asserting that a continuation opening with `|expression| ...` is left intact even when it matches at a line boundary in the previous tail. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
Review completed with one non-blocking suggestion. Deterministic checks reported no changed-file findings, CI is passing, and the targeted packages/core geminiChat.test.ts run passed (74/74).
— gpt-5.5 via Qwen Code /review
| * pattern is not a valid GFM table row anyway. | ||
| */ | ||
| function startsWithMarkdownStructuralAnchor(text: string): boolean { | ||
| const trimmed = text.replace(/^\s+/, ''); |
There was a problem hiding this comment.
[Suggestion] This tightened table-row anchor now excludes valid one-column GFM tables. A row like | Name | has only two pipes, so it fails this branch, while the following separator row | --- | is accepted only if the replay starts there. If a provider resumes by replaying the header of a one-column table, the contained-prefix dedup path will skip it and the header/table can still be duplicated in durable history. Consider treating a two-pipe row as structural when it is immediately followed by a valid separator row, so bare prose like |expression| remains rejected but | Name |\n| --- | still dedups.
— gpt-5.5 via Qwen Code /review
Adds a regression test for @tanzhenxin's review comment: the existing `prompt-recovery-thinking-continuation` test only asserts joined non- thought text, so a regression where the recovery turn's leading thought ends up *after* the merged text part slips through. The new test explicitly asserts `thoughtIdx < mergedTextIdx` in the final history entry. Thinking-model providers (Gemini 2.5+, Anthropic, OpenAI o-series) validate thought-signature provenance and expect a thought to precede the content it generated; without an ordering assertion the dedup path could silently violate that invariant. The new test fails on the current implementation (`appendRecoveryContinuationParts` appends the leftover leading thought at the end of the part list). Fix follows in a separate commit so the red → green transition is reviewable. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The recovery dedup path in `appendRecoveryContinuationParts` previously spliced only the matched continuation text part out of `nextParts` and appended the leftover parts (including any leading thought) after the merged text. For thinking-model providers (Gemini 2.5+, Anthropic, OpenAI o-series) that validate thought-signature provenance, this violated the invariant that a thought precedes the content it generated: durable history ended up as `[..., previousText + suffix, recoveryThought]`, with the recovery turn's thought trailing its own text. Hoist any non-text parts that preceded the matched text on the continuation side (typically the recovery turn's thought) into `mergedParts` directly before the merged text part. Trailing non-text parts (tool calls etc.) keep their position via the final concat. Existing `prompt-recovery-thinking-continuation` test still passes because it only asserts joined non-thought text; the new `...-order` test now passes as well. Reported by @tanzhenxin in PR review on commit 556b015. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
@tanzhenxin Re: Issue 1 (thought-ordering on Repro confirmed at byte level: with the previous head ( Fix: went with your option (1) — hoist the continuation's pre-text non-text parts into Existing tests: all 75 in Re: the two low-severity follow-ups you mentioned (W5's 400-vs-1200 lookback asymmetry doc-only response, and the closing-fence arm being slightly over-broad) — happy to take either in this PR if you'd like; my read is they're separable. Let me know. |
wenshao
left a comment
There was a problem hiding this comment.
Incremental review of 25ea94e1..a916c6886: the appendRecoveryContinuationParts hoisting fix correctly repositions the recovery turn's leading non-text parts (thoughts) before the merged text, resolving the thought-signature provenance issue for thinking-model providers. The new test should keep the recovery thought before the merged text part explicitly validates structural ordering. All 75 tests pass, tsc/eslint clean. One Nice to have micro-optimization noted (double array shift could be merged into single splice) but not blocking. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
Maintainer test report — PR #3966Built and validated locally in a dedicated This PR has gone through multiple rounds of review with extensive feedback on the dedup logic's false-positive risk. The test focus below is deliberately weighted toward verifying that the latest revision actually closes those concerns, not just that the happy paths still work. Environment
Results
Targeted verification — every prior review concern has a passing testTanzhenxin's Round-1 review flagged the contained-prefix fallback as silently dropping legitimate continuation text on common short phrases. I re-ran the tests that exist because of that critique, plus the Round-2 thought-ordering critical: Every shape from tanzhenxin's "prose loss" table (the Bundle proof — dedup pipeline is shippedIn
All non-zero → none was dead-code-eliminated; the layered defense (suffix-anchored scan → contained-prefix fallback gated by structural anchor + line boundary + 12-byte floor + 400-char lookback) is actually compiled into the binary. Risk assessmentLow-medium risk. The behavioral surface is large and the failure mode of an over-eager dedup (silent text loss) is harder for users to notice than the duplication it replaces — that's the reason this PR went through 11 commits and ~5 review rounds. The final revision is conservative:
Mergeability statusGitHub reports Not covered locally
Reproducegit fetch origin pull/3966/head:pr-3966 && git checkout pr-3966
npm install
cd packages/core
npx vitest run src/core/geminiChat.test.ts --no-coverage # 75/75
npx vitest run src/core/geminiChat.test.ts \
-t "prose|line-boundary|coalesce|opener|far-earlier|thought before|single-cell pipe|previous_response_suffix" \
--no-coverage --reporter=verbose # 11 targeted guard tests
npm run typecheck
cd ../.. && npx eslint packages/core/src/core/geminiChat.ts \
packages/core/src/core/geminiChat.test.tsRecommendation: safe to merge once a maintainer (re-)dismisses the stale CHANGES_REQUESTED review. The architecture is right (pure helpers + bounded windows + multi-axis anchoring), the false-positive surface that the review pushed back on is now explicitly pinned by regression tests, and the binary contains the fix. |
LaZzyMan
left a comment
There was a problem hiding this comment.
Review
This addresses a real, reproducible bug, and the layered defenses — structural-anchor gating on the contained-prefix path, dual byte/codepoint floors that handle CJK, the line-boundary check, the <previous_response_suffix> prompt, and the thought-signature ordering fix — are genuinely careful engineering. The 27+ regression tests pin the edges down precisely. This should merge.
I do want to flag a forward-looking architectural concern, since I think it's worth raising even though it doesn't block this PR. Almost all of the complexity here exists because coalesceRecoveryPairs collapses [modelTruncated, userRecovery, modelContinuation] into a single merged assistant turn. Once you decide to physically concatenate the two assistant parts in history, any replay the provider emits becomes real duplication inside the stored text — and the only fix is to model what each provider does at the boundary. That makes geminiChat.ts the place where provider-quirk knowledge has to live, and the surface is open-ended (DeepSeek's table-header replay today, the next provider's pattern tomorrow). Each new pattern adds a branch and a tuned constant; nothing ever gets to come back out.
claude-code takes a different shape: it keeps the three messages separate and flags the recovery user turn as isMeta, letting the chat-recording, compaction, export, and rewind layers each filter meta turns for their own concerns. The UI layer renders the continuation seamlessly via the existing buffer-keep logic. geminiChat then doesn't need any provider-specific knowledge of replay shapes at all — the duplication stops being a problem because it's no longer concatenated. From a quick read of the qwen-code codebase, that refactor looks like ~80 lines spread across the layers that already own each concern, and the dedup helpers introduced here could come back out over time.
This PR is the right fix under the current architecture and it shouldn't wait on a redesign. If you're open to it, I'm happy to file a tracking issue with a concrete sketch so we can discuss whether the merging assumption is worth revisiting separately.
Verdict
APPROVE — fix is correct and well-tested; the architectural note is for a follow-up discussion, not a blocker.
already fixed the issue, and two reviewer has approved the pr, I will dismiss the and merge the pr
* fix(core): deduplicate geminiChat recovery continuation text When a provider hits MAX_TOKENS and the model resumes via the recovery loop, the continuation stream sometimes re-sends characters from the end of the previous response as a context anchor. Without deduplication this causes repeated Markdown tables/prose in the final history even if the live UI suppresses them. Add getRecoveryContinuationSuffix / findContainedRecoveryPrefixReplayLength to strip the replayed prefix before appending the continuation parts. Also include the last 1200 chars of the previous response in the recovery prompt so the model can see where it left off. Two new tests cover: - exact suffix overlap (shared recovery suffix and continuation) - contained tail anchor replay (Markdown table prefix replayed mid-text) Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): tighten contained-prefix recovery dedup to avoid prose loss Address review feedback on PR #3966: the contained-prefix fallback in geminiChat recovery dedup was too permissive — a 6-byte minimum plus a 4000-char lookahead window allowed common opener phrases ("In summary,", "In conclusion,", "Here is the …") to silently strip legitimate continuation text whenever they happened to coincide with any substring in the previous turn. Silent loss is a worse failure mode than the duplication we were fixing. Constrain the fallback to its real intended use case — replayed Markdown blocks that providers re-emit at the start of a recovery continuation (table headers, headings, fenced code, lists, blockquotes): - Require the continuation to *open* with a Markdown structural anchor before considering any contained-prefix replay; plain prose openers fall through with no dedup attempted. - Restrict the substring search to the immediate truncation tail (last 400 chars) so a coincidental match far above the truncation point cannot win. - Raise the contained-prefix byte floor (12 bytes) above the suffix- overlap floor. Also add coverage for the previously-untested guard branches (empty input, full-overlap drop, empty previous-text path that skips the <previous_response_suffix> block) and regression tests for the prose-loss scenarios called out in review. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): handle leading whitespace in structural anchor + cover tail truncation Address wenshao review on PR #3966: - `startsWithMarkdownStructuralAnchor` now strips all leading whitespace (`/^\s+/`) instead of only newlines (`/^\n+/`). Some providers re-emit a recovered Markdown block with leading spaces or tabs, not just newlines; the old regex caused the structural-anchor gate to fail and the contained-prefix dedup path was silently skipped. - Add a regression test for `buildOutputRecoveryMessage` that exercises the `previousText.slice(-OUTPUT_RECOVERY_TAIL_CHARS)` truncation branch with a 1300-char previous response, asserting the <previous_response_suffix> block contains exactly the trailing 1200 chars and that the dropped head does not leak. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): unify plain-text predicate and harden recovery delimiter Address two review concerns on geminiChat output-recovery: - `isPlainTextPart` was a near-duplicate of `isValidNonThoughtTextPart` with subtly weaker guards (missing thoughtSignature/inlineData/fileData and using `!== true` vs `!part.thought`). Delegate to the shared predicate so the recovery-merge and consolidated-history paths agree on what counts as plain text. - `buildOutputRecoveryMessage` embedded the previous response inside a `<previous_response_suffix>` pseudo-XML block without sanitization. If the model's own truncated output contained the literal closing tag (e.g. while generating XML/HTML examples), the recovery prompt's structure would break. Neutralize literal opening/closing delimiters inside the tail with a zero-width space so the prompt always has exactly one well-formed block; add a regression test that asserts the delimiter pair count stays at 1/1 even when the tail contains a raw `</previous_response_suffix>`. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): cover opening-tag branch of sanitizeRecoverySuffixTail The existing prompt-recovery-delimiter-collision test only exercises the closing-tag (`</previous_response_suffix>`) neutralization path. Add a sibling test that emits a literal opening tag in the previous model turn so the opening-tag replace branch is also covered. Asserts exactly one opening/closing delimiter pair in the recovery message and that the neutralized variant (with zero-width space) appears in the embedded tail. * docs(core): document recovery-dedup constants and tighten contained-prefix anchor Address PR #3966 review polish items from wenshao: - Add JSDoc rationale to each magic constant (OUTPUT_RECOVERY_TAIL_CHARS, RECOVERY_OVERLAP_MAX_SCAN_CHARS, RECOVERY_OVERLAP_MIN_BYTES, RECOVERY_STRUCTURAL_OVERLAP_MIN_BYTES) so future tuning is grounded. - Make the contained-prefix scan symmetric: require the match inside previousTail to begin at index 0 or immediately after a newline, mirroring the structural-anchor check on the continuation side. All occurrences are walked so a benign mid-paragraph hit doesn't shadow a real line-anchored match later in the 400-char tail window. - Document the suffix-anchored overlap loop's O(n^2) bound and the bounded scan cap so the perf characteristic is explicit rather than reverse- engineered. - Explain why appendRecoveryContinuationParts always shifts the first continuation text part even when the dedup suffix is empty (empty suffix means a pure replay that must be discarded). All 68 tests in geminiChat.test.ts still pass; typecheck and lint clean. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): scan recovery parts for plain-text + CJK-safe overlap floor `appendRecoveryContinuationParts` previously only inspected the boundary parts (last of previous, first of continuation). `processStreamResponse` orders parts as `[thoughtPart?, ...consolidatedHistoryParts]`, so for thinking models the first continuation part is the recovery turn's thought — the plain-text predicate failed on it and the entire dedup block was skipped, leaking the replayed overlap into durable history. Now scan both sides for the plain-text anchor and splice the matched text part rather than shifting the head. Allocate a fresh merged part instead of mutating `mergedParts[i].text` in place so callers caching part references never observe a half-merged turn. Two additional hardening fixes on the overlap path: - `isSignificantRecoveryOverlap` adds a 4-code-point floor on top of the 6-byte floor for prose. CJK characters are 3 UTF-8 bytes each, so the byte-only floor admitted 2-character coincidences like "我们" / "但是" that recur constantly across unrelated Chinese turns. The structural-anchor branch is exempted (those collisions are far rarer and the structural floor already governs them). - `findContainedRecoveryPrefixReplayLength` now strips leading whitespace from the continuation before matching. The structural- anchor check already tolerated leading spaces/tabs (some providers re-emit replayed blocks with extra indentation), but the substring scan still used the un-trimmed prefix and silently failed to match the corresponding `previousTail` occurrence. Adds three regression tests covering: a thinking-model recovery continuation whose first part is a thought, a 2-CJK-character coincidence that must NOT be dedup'd, and a leading-whitespace structural replay that must be dedup'd. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * docs(core): cover recovery-dedup line-boundary + normalization branches Add JSDoc to getRecoveryContinuationSuffix calling out that its empty-input guard is defensive-only (the production caller already filters both sides), and document appendRecoveryContinuationParts' implicit coupling with processStreamResponse's text-part consolidation plus its return-shape convention that coalesceRecoveryPairs relies on for multi-iteration recovery. Add two regression tests: - mid-paragraph match rejection: a structural anchor that appears in the previous tail but is not preceded by a newline must NOT trigger the contained-prefix strip, so legitimate continuation survives verbatim. - newline-normalization branch: when the replayed prefix ends with \n but the previous tail does not and the suffix does not start with \n, the helper must insert a separator so the coalesced text keeps its block boundary. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): tighten table-row anchor + document structural-class scope Tightens `startsWithMarkdownStructuralAnchor`'s table-row alternation so a bare `|expression|` (2 pipes) in technical prose no longer qualifies as a Markdown block anchor — real GFM table rows have ≥3 pipes (≥2 cells) or a separator row like `|---|`. Without this, prose continuation starting with a 2-pipe expression that re-appears at a line boundary mid-tail of the previous response would be silently stripped by the contained-prefix path, contradicting the JSDoc's stated invariant that "incidental `|` characters in prose do not count." Also adds an inline comment to `isSignificantRecoveryOverlap` documenting why the structural-class detection (`[#|`\n]`) is intentionally loose — the 2-byte gap between the 4-byte structural floor and the 6-byte prose floor only matters for 4–5 byte fragments that coincide on both sides of a truncation boundary, which is far rarer than the structural-replay scenarios the lower floor exists to catch. Adds a regression test asserting that a continuation opening with `|expression| ...` is left intact even when it matches at a line boundary in the previous tail. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): pin recovery thought-before-text ordering Adds a regression test for @tanzhenxin's review comment: the existing `prompt-recovery-thinking-continuation` test only asserts joined non- thought text, so a regression where the recovery turn's leading thought ends up *after* the merged text part slips through. The new test explicitly asserts `thoughtIdx < mergedTextIdx` in the final history entry. Thinking-model providers (Gemini 2.5+, Anthropic, OpenAI o-series) validate thought-signature provenance and expect a thought to precede the content it generated; without an ordering assertion the dedup path could silently violate that invariant. The new test fails on the current implementation (`appendRecoveryContinuationParts` appends the leftover leading thought at the end of the part list). Fix follows in a separate commit so the red → green transition is reviewable. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): keep recovery thought before merged text part The recovery dedup path in `appendRecoveryContinuationParts` previously spliced only the matched continuation text part out of `nextParts` and appended the leftover parts (including any leading thought) after the merged text. For thinking-model providers (Gemini 2.5+, Anthropic, OpenAI o-series) that validate thought-signature provenance, this violated the invariant that a thought precedes the content it generated: durable history ended up as `[..., previousText + suffix, recoveryThought]`, with the recovery turn's thought trailing its own text. Hoist any non-text parts that preceded the matched text on the continuation side (typically the recovery turn's thought) into `mergedParts` directly before the merged text part. Trailing non-text parts (tool calls etc.) keep their position via the final concat. Existing `prompt-recovery-thinking-continuation` test still passes because it only asserts joined non-thought text; the new `...-order` test now passes as well. Reported by @tanzhenxin in PR review on commit 556b015. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Problem
When a provider hits
MAX_TOKENSmid-response, the recovery loop injects a user turn and the model resumes. Some providers start the continuation stream by re-sending a few characters (or an entire table/section) from the end of the previous response as a context anchor.Without deduplication this replayed prefix gets appended verbatim to history, causing repeated Markdown tables and prose in all subsequent turns — even if the live UI suppressed it visually.
Fix
Two new helpers in
geminiChat.ts:getRecoveryContinuationSuffix— strips a suffix-overlap between the end of the previous model turn and the start of the continuation (the common case: provider re-sends the last sentence as a context bridge).findContainedRecoveryPrefixReplayLength— handles the less common case where the provider re-sends a chunk that appeared somewhere in the tail of the previous response (e.g. a full Markdown table header), not necessarily the exact last bytes.appendRecoveryContinuationPartscalls both and splices the deduped suffix intoprecedingModel.partsinstead of naively concatenating.The recovery prompt also now includes the last 1 200 chars of the previous response (
buildOutputRecoveryMessage) so the model can see its stopping point.Tests
Two new cases in
geminiChat.test.ts:should coalesce overlapping recovery continuation text— exact shared suffixshould coalesce recovery text that replays a previous tail anchor— Markdown table prefix replayed as anchor mid-textAll 61 tests pass.
Scope
Pure
packages/corechange. No CLI/rendering changes. Independent of the ink 7 upgrade and virtual viewport work.🤖 Generated with Qwen Code