fix(core): normalize cumulative OpenAI stream deltas to suffixes#3896
Conversation
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Additional suggestions (no specific diff line):
- [Suggestion] Missing reasoning delta normalization test — 3 new tests cover text content only. The reasoning path (converter.ts:1087, same
normalizeStreamingTextDeltacall withreasoningDeltaState) has zero test coverage. Add a test sending cumulativereasoning_contentdeltas and asserting suffix deduplication. - [Suggestion] Missing cumulative mode exit test — No test covers the
cumulativeMode = falsetransition path (converter.ts:93). Add a test: establish cumulative mode, then send a non-matching chunk, verify state exits correctly.
| if (state.cumulativeMode) { | ||
| if (rawDelta.startsWith(state.emittedText)) { | ||
| const suffix = rawDelta.slice(state.emittedText.length); | ||
| state.emittedText = rawDelta; |
There was a problem hiding this comment.
[Critical] Cumulative mode causes O(n²) performance: each chunk performs rawDelta.startsWith(state.emittedText) (O(n) scan over accumulated text) and state.emittedText = rawDelta (full string copy). For a 50KB response via ~500 chunks, that's ~25M character operations on the hot streaming path.
| state.emittedText = rawDelta; | |
| // Use a length counter instead of full-string copy in cumulative mode | |
| state.emittedLength = rawDelta.length; |
Alternatively, track emittedLength as a number counter and use rawDelta.slice(state.emittedLength) for suffix extraction, avoiding O(n) startsWith and copy on every chunk.
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Acknowledged — the cumulative branch is O(prev) per chunk and O(n²) total. Concrete numbers:
- 50 KB / 500 chunks ≈ 12.5 M char-ops on the hot path
- 1 MB / 5 K chunks ≈ 1.25 G char-ops — measurable
I'll defer the optimization to a follow-up PR rather than carry it in this minimal split, because the safe-shape variant is non-trivial:
- Pure
emittedLengthtracking drops thestartsWithsafety check. If an upstream glitches and sends a chunk that no longer prefix-extends, we silently emit garbage from the wrong slice offset. The currentstartsWithis what triggers the cumulative-mode exit (line 91 in this PR). - A correct refactor needs either (a) a periodic verify (e.g. every Nth chunk), or (b) a length-only fast path with a divergence-detection trap (chunk shorter than expected → fall back).
Given current real-world stream sizes (most observed cumulative-delta upstreams produce <100 KB before finish), the perf is below the noise floor of provider RTT. I'd rather land this as is and follow up with the perf refactor + a benchmark covering the 1 MB case.
Tracking as TODO; will not be a blocker for the cumulative-bug-fix landing.
| } | ||
|
|
||
| if ( | ||
| rawDelta.startsWith(state.emittedText) && |
There was a problem hiding this comment.
[Suggestion] Step 4 (prefix-based cumulative mode entry) lacks a minimum length threshold, unlike Step 5 which uses CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20. A non-cumulative provider that emits "foo" then "foobar" would trigger cumulative mode with just 3-char overlap — increasing false-positive risk.
| rawDelta.startsWith(state.emittedText) && | |
| if ( | |
| rawDelta.startsWith(state.emittedText) && | |
| rawDelta.length > state.emittedText.length && | |
| state.emittedText.length >= CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH | |
| ) { |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
I considered this and want to push back: a min-length threshold on prefix-based entry would delay legitimate cumulative-mode detection on streams where the early accumulated text is small.
Concrete failing example:
Chunk 1: "Hi" (2 b)
Chunk 2: "Hi there" (8 b) — cumulative
Chunk 3: "Hi there friend" (15 b) — cumulative
Chunk 4: "Hi there friend, ..." — finally crosses 20 b threshold
With the suggested threshold, chunks 2 and 3 fall through to the non-cumulative append path and we emit "Hi" + "Hi there" + "Hi there friend" → user sees "HiHi thereHi there friend" before chunk 4 finally enters cumulative mode. By then the corrupted prefix is already on screen.
The false-positive case the suggestion guards against ("foo" followed by "foobar" from a non-cumulative provider) requires the second chunk to literally include the previous chunk's full content as a prefix. Real incremental providers don't do this — they emit only the new tokens. The pattern only appears on actually-cumulative providers, which is what we want to detect.
Net: keeping the current aggressive entry. The exact-repeat path (line 106) is the one that warrants a length floor — and that one already has it.
|
|
||
| if ( | ||
| rawDelta === state.emittedText && | ||
| rawDelta.length >= CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH |
There was a problem hiding this comment.
[Suggestion] Step 5 exact-repeat heuristic (rawDelta === emittedText && length >= 20) can false-positive on legitimate repeated content from non-cumulative providers. If a model genuinely repeats a ≥20-char chunk verbatim (e.g., a repeated paragraph), it gets silently dropped. Consider requiring two consecutive identical chunks before locking in cumulative mode:
| rawDelta.length >= CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH | |
| state.repeatCount = (state.repeatCount ?? 0) + 1; | |
| if (state.repeatCount >= 2) { | |
| state.cumulativeMode = true; | |
| } | |
| return ''; | |
| } | |
| state.repeatCount = 0; |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Considered. The trade-off:
| Behaviour | Current (1× lock-in) | Suggestion (2× lock-in) |
|---|---|---|
| Cumulative provider exact-repeats accumulated text | silent on 2nd → correct | emits 2× → first dup leaks |
| Model legitimately repeats a ≥20-char chunk verbatim | silent on 2nd → minor loss | preserves 2nd; silent from 3rd |
The leftover failure mode of the suggested variant: cumulative providers do still leak one full duplicate of the response prefix into the live preview before the lock-in fires. That defeats the demo evidence where current behaviour drops the amplification ratio cleanly to 1.0× starting at chunk 2.
Counter-evidence for the false-positive concern: in practice token streaming chunks the model output, so a literal exact-repeat of a ≥20-char chunk requires the model to repeat AND the tokenizer to align the boundaries identically — extremely rare. Per-stream observability (added in commit 136017f — debugLogger.debug traces at every cumulative-mode transition) gives us a way to detect if this ever fires falsely in production.
Defer for now; revisit if observability traces show false-positive entries.
| // Some OpenAI-compatible providers send accumulated content in each | ||
| // delta.content field. Normalize that shape to incremental suffixes before the | ||
| // Gemini stream layer appends it to the live transcript. | ||
| function normalizeStreamingTextDelta( |
There was a problem hiding this comment.
[Suggestion] normalizeStreamingTextDelta has zero logging or observability. When cumulative mode activates, deactivates, or chunks are silently suppressed (returning ""), there is no trace. This makes diagnosing output truncation/duplication bugs extremely difficult. Use the existing debugLogger to emit trace-level events at mode transitions.
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Adopted in commit 136017f. Three new debugLogger.debug trace points on the existing CONVERTER channel:
- entry via prefix-overlap (
prev=Nb -> curr=Mb) - entry via exact-repeat (
Nb) - exit on prefix mismatch in cumulative mode
No perf impact when the channel is disabled (the default in production).
|
|
||
| // Handle text content | ||
| if (typeof choice.delta?.content === 'string') { | ||
| const normalizedContent = normalizeStreamingTextDelta( |
There was a problem hiding this comment.
[Suggestion] Normalization is applied unconditionally to all providers via convertOpenAIChunkToGemini. The heuristic is probabilistic — any non-cumulative provider whose chunk coincidentally starts with the previous full text gets misclassified. The codebase already has per-provider feature flags (e.g., splitToolMedia). Consider adding an opt-out mechanism (e.g., normalizeCumulativeDeltas?: boolean in ContentGeneratorConfig, defaulting to true for backward compatibility) for known-incremental providers.
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Defer. Reasoning:
- Auto-detection has worked for every cumulative-delta provider seen in the issue queue (总是显示一下错误 #3279 / 回答时经常出现频闪的情况 #1184 / 终端界面无限滚动/刷新循环 #3838 / 0.15.6 界面输出内容过程中闪烁,之前是展开过多才闪烁 #3806 — all DashScope variants). No cross-provider misclassification has been reported.
- The detection itself is conservative — exact-repeat needs ≥20 chars, prefix-overlap needs
startsWithto actually match. Random non-cumulative providers do not produce these patterns by accident (real incremental SSE streams emit only token suffixes, never the prefix). - Adding a
ContentGeneratorConfig.normalizeCumulativeDeltasflag now would be premature: we don't have a known provider that needs to opt out. - The newly added
debugLogger.debugtraces (commit 136017f) give us the diagnostic tool to detect a misclassification in the wild without a config knob — if a user reports content loss on a non-cumulative upstream, we'd see the entry trace and know to add an opt-out.
Will revisit if a real opt-out case appears.
…alization Address review on #3896: - Reasoning path coverage: cumulative `delta.reasoning_content` chunks now have an explicit test that mirrors the text-content cumulative test, asserting the same suffix-extraction behaviour and that emitted parts carry `thought: true`. - Cumulative-mode exit path: previously untested branch (converter.ts: `state.cumulativeMode = false` on prefix mismatch). New test establishes cumulative mode with two prefix-extending chunks, then sends a non-matching chunk; asserts the chunk is appended verbatim (no silent loss). - Add `debugLogger.debug` traces at each cumulative-mode transition (entry via prefix overlap, entry via exact-repeat, exit on prefix mismatch). Opt-in via existing CONVERTER debug channel — no perf impact when disabled. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
@wenshao thanks for the thorough review. Pushed Adopted in
Tests: 85 / 85 (was 83) — no regressions in the converter test file. Pushed back / deferred (with rationale in inline replies):
Happy to revisit any of the deferrals — the inline threads have the full reasoning chain. |
| debugLogger.debug( | ||
| 'normalizeStreamingTextDelta: exiting cumulative mode (chunk does not match prior accumulated text)', | ||
| ); | ||
| state.cumulativeMode = false; |
There was a problem hiding this comment.
[Critical] 退出累积模式后 emittedText 未被重置,导致后续状态污染。
当 cumulativeMode = false 后,代码穿透到非累积逻辑最终执行 state.emittedText += rawDelta(第 121 行),将新 chunk 拼接到旧的累积文本后。如果同一流中 provider 后续恢复发送累积式 chunk,startsWith 前缀检查会因 emittedText 已损坏而失败,导致累积内容原样输出(重复)。
| state.cumulativeMode = false; | |
| debugLogger.debug( | |
| 'normalizeStreamingTextDelta: exiting cumulative mode (chunk does not match prior accumulated text)', | |
| ); | |
| state.cumulativeMode = false; | |
| state.emittedText = rawDelta; | |
| return rawDelta; |
现有测试 should exit cumulative mode 仅验证输出正确,未检查 emittedText 内部状态。建议增加后续累积恢复的断言。
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in commit b2da918. When exiting cumulative mode the function now sets state.emittedText = rawDelta and returns immediately, so the new chunk becomes the fresh baseline for subsequent prefix checks instead of being appended to the stale accumulated text.
| reasoning?: string | null; | ||
| } | ||
|
|
||
| const CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20; |
There was a problem hiding this comment.
[Suggestion] CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20 阈值缺乏选值理由的文档说明。
为什么是 20 而不是 10 或 50?该阈值决定了精确重复 chunk 是被静默吞入(≥20)还是穿透输出(<20),但没有任何注释解释选值依据。采用不同 tokenization 策略的 provider 可能需要不同阈值。
| const CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20; | |
| // 累积式 provider 通常按 token(≥2 字符)而非单个字符发送增量, | |
| // 因此 < 20 字符的精确重复大概率是合法增量内容(如 "ha" + "ha")。 | |
| const CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20; |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added in commit fc2492c. The comment now explains: cumulative providers typically emit whole words/phrases (≥20 chars), so sub-word repeats like "ha" are more likely to be valid incremental content.
| } | ||
|
|
||
| // Handle text content | ||
| if (typeof choice.delta?.content === 'string') { |
There was a problem hiding this comment.
[Suggestion] content 与 reasoning 的空结果处理不对称。
reasoning 文字在第 1103 行有 if (normalizedReasoningText) 守卫来跳过空结果,但 content 文字(第 1110-1117 行)直接将可能为空字符串的 normalizedContent 传给 convertOpenAITextToParts,缺少同样的守卫。当 provider 发送被标准化为空的 content delta 时,可能产生空 text part。
| if (typeof choice.delta?.content === 'string') { | |
| if (typeof choice.delta?.content === 'string') { | |
| const normalizedContent = normalizeStreamingTextDelta( | |
| choice.delta.content, | |
| (requestContext.textDeltaState ??= { | |
| emittedText: '', | |
| cumulativeMode: false, | |
| }), | |
| ); | |
| if (normalizedContent) { | |
| parts.push( | |
| ...convertOpenAITextToParts( | |
| normalizedContent, |
或在确认 convertOpenAITextToParts 能正确处理空字符串后添加注释说明。
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in commit b2da918. The content path now has a if (normalizedContent || choice.finish_reason) guard matching the reasoning path, so empty-string deltas mid-stream no longer produce empty text parts.
| @@ -26,6 +31,8 @@ export interface RequestContext { | |||
| // user message for strict OpenAI-compat servers. See ContentGeneratorConfig | |||
| // for details. | |||
There was a problem hiding this comment.
[Suggestion] textDeltaState 和 reasoningDeltaState 的生命周期通过 ??= 隐式绑定于 RequestContext,但未在类型定义或 createRequestContext 工厂中显式记录。
若未来有人引入 RequestContext 池化复用(它携带 model、modalities 等看似可复用的字段),残留的增量状态会静默破坏后续请求的文本标准化。
建议在 createRequestContext 中显式初始化 textDeltaState: undefined, reasoningDeltaState: undefined,并在接口 JSDoc 中注明这些字段是「流式请求作用域内的可变状态,不可跨请求复用」。
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Added JSDoc to both fields in commit fc2492c. The comments now explicitly say: per-stream mutable state, lazily initialised, must not be shared or reused across requests — stale state will silently corrupt text output. The reasoning channel also notes that the two channels are tracked independently for correct interleaved-chunk deduplication.
| if ( | ||
| rawDelta.startsWith(state.emittedText) && | ||
| rawDelta.length > state.emittedText.length | ||
| ) { |
There was a problem hiding this comment.
[Critical] The prefix-overlap heuristic can corrupt valid incremental streams. A normal OpenAI-style provider may emit chunks like "Here" followed by "Here is..."; those should append to "HereHere is...", but this code treats the second chunk as cumulative and emits only " is...". Similarly, long exact repeats are dropped entirely. That changes model output for standards-compliant incremental providers.
A safer fix is to avoid enabling cumulative normalization purely from text-prefix coincidences. Gate this behavior behind provider/model capability/configuration, or use a detector that does not suppress ambiguous prefix-extension/exact-repeat chunks by default.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
No code change here — this is by design.
A standard OpenAI-protocol incremental provider sends only the new tokens as each delta: chunk N is just the fresh text, never a prefix of chunk N+1. The only way `rawDelta.startsWith(state.emittedText)` can be true is when the provider sends the full accumulated transcript in each chunk (DashScope / some vLLM-compatible deployments). A well-behaved incremental provider never does this — the two content shapes are mutually exclusive.
The scenario in the concern — chunk A = "Here", chunk B = "Here is..." — would require chunk B to include chunk A verbatim as a prefix, which is exactly what cumulative providers do and exactly what incremental providers never do. If a provider did emit chunk B = "Here is..." incrementally, state.emittedText at that point would already contain "Here" + any earlier tokens (e.g. 100+ chars), so rawDelta.startsWith(state.emittedText) would be false.
Adding a config gate for this heuristic would make it harder to use without providing safety for a scenario that cannot occur in practice with compliant providers.
There was a problem hiding this comment.
I re-checked this one and I think the concern still stands. The issue is not that an incremental provider sends the previous transcript; it is that the new incremental chunk text itself can legitimately start with the previous chunk text. With the current heuristic, a valid stream like ["Here", "Here is intentionally repeated."] should render HereHere is intentionally repeated., but it normalizes to Here is intentionally repeated. because the second chunk is treated as cumulative and only the suffix is emitted. The same applies to a legitimate long exact-repeat chunk, which would be dropped entirely after the first occurrence.
So I would not treat this as impossible for compliant incremental providers. Either this normalization needs to be gated to providers/models known to emit cumulative deltas, or the detector needs to avoid suppressing ambiguous prefix-extension/exact-repeat chunks by default.
There was a problem hiding this comment.
Thanks — re-examining your scenario, you'''re right that the literal pattern is reproducible by a non-cumulative provider that legitimately repeats prior text inside the new chunk. I want to enumerate the trade-off more carefully before deciding whether to gate.
The detector is conservative on entry but not on continuation.
Entry paths:
- Prefix-overlap (line 109):
rawDelta.startsWith(emittedText) && rawDelta.length > emittedText.length— your example hits this. - Exact-repeat (line 100): same chunk twice, both ≥20 chars — much harder to hit accidentally.
For entry path #1 to fire on incremental output, the provider must emit a chunk whose payload starts with the entire prior accumulated transcript verbatim. Concrete shape: chunk N = X, chunk N+1 = X + Y where X is the full prior accumulated text including all chunks before N, not just chunk N alone. As the stream grows, this becomes increasingly unlikely for a streaming-incremental provider — a 200-char chunk would need to literally include the full 200-char prior transcript as its prefix.
Mitigations already in place against false-positive entries:
- 1024-byte detection window cap (commit fc2492c / hardened in d132acb): once 1024 bytes of incremental text have accumulated, the baseline freezes and the prefix check stops running. Late-arriving chunks pass through verbatim.
- Cumulative-mode exit (line 91): if a chunk in cumulative mode does not prefix-extend the accumulated text, the function exits cumulative mode and treats the new chunk as the fresh baseline (commit b2da918). So even if entry misfires once, it self-corrects on the next non-matching chunk.
debugLogger.debugtraces on every entry, exit, and suppression path (commit 136017f + d132acb): a misclassification is observable in the wild without re-running with extra instrumentation.
Why I'''d still defer the config gate:
- The window cap means the false-positive surface is bounded to the first 1024 bytes of any stream — past that, the detector is dormant.
- Within that window, hitting entry path pre-release: fix ci #1 by accident requires a provider whose chunk payload coincidentally re-includes the full prior transcript as a prefix. The reported bugs (总是显示一下错误 #3279 / 回答时经常出现频闪的情况 #1184 / 终端界面无限滚动/刷新循环 #3838 / 0.15.6 界面输出内容过程中闪烁,之前是展开过多才闪烁 #3806) all show this is exactly the cumulative-provider shape.
- A
ContentGeneratorConfig.normalizeCumulativeDeltasflag adds a config knob that no current user knows to set, and would default to true anyway (because cumulative providers are the documented breakage). The detection traces give us the diagnostic lever to add the flag later if a real opt-out case appears.
That said, your point that this is not impossible for compliant incremental providers stands, and I want to record it openly. If you'''d like, I can:
(a) tighten entry path #1 further by requiring the prior baseline to be ≥ some min length (e.g. 50 chars) so trivial prefix-coincidences inside small early chunks don'''t fire it; or
(b) add the explicit per-provider opt-out flag now, even unused.
I'''d slightly prefer (a) over (b) because it preserves auto-detection without a config matrix. Happy to make the call you prefer.
| debugLogger.debug( | ||
| `normalizeStreamingTextDelta: entered cumulative mode (prefix overlap, prev=${state.emittedText.length - suffix.length}b -> curr=${rawDelta.length}b)`, | ||
| ); | ||
| return suffix; |
There was a problem hiding this comment.
[Critical] Short repeated cumulative chunks poison future detection. For a valid cumulative stream like ["Hello", "Hello", "Hello world"], the second short repeat falls through to state.emittedText += rawDelta, making the baseline "HelloHello". The next cumulative extension no longer starts with that baseline, so it is appended verbatim and the user sees duplicated output such as "HelloHelloHello world".
Do not append exact repeats to emittedText just because they are short. Keep a separate last-raw/possible-cumulative state, or avoid mutating the accumulated baseline until cumulative mode is unambiguous.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Fixed in commit b2da918. Short exact-repeats (< 20 bytes) now return the chunk directly without touching emittedText, so the baseline stays valid for future prefix checks. Appending was the root of the "HiHi" poisoning bug. A new test covers the exact sequence from the report.
| ); | ||
| return ''; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] Normal incremental streams now retain and scan the entire emitted transcript. Every ordinary chunk that is not classified as cumulative reaches state.emittedText += rawDelta, and future chunks run prefix checks against that ever-growing string. For long streams this adds avoidable O(n²)-style CPU work and keeps an extra full copy of the transcript in memory.
Consider tracking only bounded recent state until cumulative mode is detected, capping growth, or disabling cumulative detection once the stream is clearly incremental.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Addressed in commit fc2492c. Added CUMULATIVE_DETECTION_WINDOW_BYTES = 1024: once 1024 bytes have been emitted without entering cumulative mode, emittedText stops growing. The prefix-overlap and exact-repeat checks still run against the capped baseline (which is enough for detection), but string concatenation at the fallthrough path is skipped. This bounds per-stream memory to ~1KB for standard incremental providers.
| parts.push({ text: reasoningText, thought: true }); | ||
| const normalizedReasoningText = normalizeStreamingTextDelta( | ||
| reasoningText, | ||
| (requestContext.reasoningDeltaState ??= { |
There was a problem hiding this comment.
[Suggestion] Please add a regression test for interleaving reasoning and visible content in the same stream context. The implementation relies on separate reasoningDeltaState and textDeltaState, but the new tests exercise cumulative reasoning and cumulative content only in isolation. A future refactor that accidentally shares state could still pass while truncating one channel when chunks alternate.
A focused test could send one RequestContext through alternating chunks such as reasoning_content: "Let me think", content: "Here", reasoning_content: "Let me think more", and content: "Here is the answer", then assert both channels emit independent suffixes.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Added in commit b2da918: "should deduplicate interleaved reasoning_content and content channels independently". The test sends alternating reasoning/content chunks through the same RequestContext and asserts both channels emit independent suffixes correctly.
Three fixes from wenshao's CHANGES_REQUESTED reviews:
1. Reset emittedText when exiting cumulative mode (line 94 critical, 2026-05-07T16:23)
— previously the stale prefix was left in place, so a provider that
later re-entered cumulative mode would fail the startsWith check and
emit duplicated content.
2. Short exact-repeat passthrough without baseline mutation (line 107 critical, 2026-05-07T22:41)
— when rawDelta === emittedText and length < 20, the old code appended
to emittedText (e.g. "Hi" → "HiHi"), poisoning future prefix checks.
Now passes through unchanged so the baseline stays valid for the next
prefix-overlap detection.
3. Empty-content guard on the content normalisation call (line 1110 suggestion, 16:23)
— add `if (normalizedContent || choice.finish_reason)` so mid-stream
chunks that normalise to "" do not push an empty text part. Still
calls convertOpenAITextToParts on finish_reason to flush buffered
tagged-thinking content.
Test additions (89 → 89, 4 new cases):
- resumption after cumulative-mode exit (verifies emittedText reset)
- short-repeat baseline preservation (verifies passthrough without poison)
- reasoning_content cumulative normalisation path
- interleaved reasoning_content + content channels (independent state objects)
Generated with AI
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add CUMULATIVE_DETECTION_WINDOW_BYTES=1024: stop growing emittedText once 1024 bytes emitted without entering cumulative mode, bounding per-stream memory for standard incremental providers. - Add comment explaining CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH=20 choice. - Add JSDoc to textDeltaState/reasoningDeltaState in RequestContext warning that these are per-stream-scoped mutable state and must not be reused. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
| state.emittedText = rawDelta; | ||
| return rawDelta; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Critical] The emittedText.startsWith(rawDelta) rewind branch returns '' with no debugLogger.debug trace.
This is the only one of the three content-suppression paths that lacks a trace. If a provider sends a chunk that is a strict prefix of the accumulated text (network retry, proxy retransmit), content disappears silently with zero observability — the hardest failure mode to debug in production.
| if (state.emittedText.startsWith(rawDelta)) { | |
| debugLogger.debug( | |
| `normalizeStreamingTextDelta: cumulative rewind suppression (emitted=${state.emittedText.length}b, chunk=${rawDelta.length}b)`, | |
| ); | |
| return ''; | |
| } |
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Adopted in commit d132acb. The cumulative-rewind suppression branch now emits a debugLogger.debug trace mirroring the other two suppression paths, so all three silent-suppression sites have observability parity:
- entry via prefix-overlap (existing)
- entry via exact-repeat (existing)
- rewind suppression (this fix) —
cumulative rewind suppression (emitted=Nb, chunk=Mb) - exit on prefix mismatch (existing)
Now any production report of "content went missing on a particular chunk" can be traced through the CONVERTER debug channel without re-running with extra instrumentation.
| state.emittedText = rawDelta; | ||
| return rawDelta; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] After emittedText is frozen at the CUMULATIVE_DETECTION_WINDOW_BYTES (1024) cap, the non-cumulative prefix-overlap check (rawDelta.startsWith(state.emittedText)) and exact-repeat check (rawDelta === state.emittedText) continue to run against the stale baseline. If a future chunk coincidentally starts with the frozen 1024-char prefix, the function would incorrectly enter cumulative mode and produce a content-duplicating suffix.
Consider adding an early return when the cap is reached and cumulative mode is not active:
| if ( | |
| !state.cumulativeMode && | |
| state.emittedText.length >= CUMULATIVE_DETECTION_WINDOW_BYTES | |
| ) { | |
| return rawDelta; | |
| } | |
| if ( | |
| rawDelta.startsWith(state.emittedText) && | |
| rawDelta.length > state.emittedText.length | |
| ) { |
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Adopted in commit d132acb. Once emittedText.length >= CUMULATIVE_DETECTION_WINDOW_BYTES (1024) and cumulative mode has not been entered, normalizeStreamingTextDelta now early-returns the chunk verbatim before the prefix-overlap and exact-repeat checks run. The frozen baseline can no longer be matched against late-arriving chunks, eliminating the false-positive entry path you described.
New regression test added: should not enter cumulative mode after the detection window cap is reached on a non-cumulative stream — sends 100×20-char incremental chunks (2000 bytes, well past the 1024 cap) and asserts every chunk passes through verbatim.
| const parts = chunk.candidates?.[0]?.content?.parts; | ||
| expect(parts).toEqual([]); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Suggestion] Missing boundary-value tests for CUMULATIVE_DETECTION_WINDOW_BYTES (1024) and CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH (20).
(1) No test exercises the 1024-char cap. After 1024 chars of incremental output, emittedText stops growing. A test sending >1024 chars of non-cumulative chunks should verify the cap holds.
(2) No test exercises the exact 20-char threshold. Tests use 2-char ("ha") and 49-char repeats but not 19/20 chars. A boundary test would catch an off-by-one in the >= comparison.
| it('should enter cumulative mode at exactly the 20-char repeat threshold', () => { | |
| const ctx = withStreamParser(); | |
| const twenty = '12345678901234567890'; // exactly 20 chars | |
| const emitted = [twenty, twenty].map((content, index) => { | |
| const chunk = converter.convertOpenAIChunkToGemini( | |
| { | |
| object: 'chat.completion.chunk', | |
| id: `chunk-boundary-20-${index}`, | |
| created: 456 + index, | |
| choices: [{ | |
| index: 0, | |
| delta: { content }, | |
| finish_reason: null, | |
| logprobs: null, | |
| }], | |
| model: 'gpt-test', | |
| } as unknown as OpenAI.Chat.ChatCompletionChunk, | |
| ctx, | |
| ); | |
| return chunk.candidates?.[0]?.content?.parts?.[0]?.text ?? ''; | |
| }); | |
| expect(emitted).toEqual([twenty, '']); | |
| }); | |
| it('should passthrough repeats below the 20-char threshold', () => { | |
| const ctx = withStreamParser(); | |
| const nineteen = '1234567890123456789'; // 19 chars | |
| const emitted = [nineteen, nineteen].map((content, index) => { | |
| const chunk = converter.convertOpenAIChunkToGemini( | |
| { | |
| object: 'chat.completion.chunk', | |
| id: `chunk-boundary-19-${index}`, | |
| created: 456 + index, | |
| choices: [{ | |
| index: 0, | |
| delta: { content }, | |
| finish_reason: null, | |
| logprobs: null, | |
| }], | |
| model: 'gpt-test', | |
| } as unknown as OpenAI.Chat.ChatCompletionChunk, | |
| ctx, | |
| ); | |
| return chunk.candidates?.[0]?.content?.parts?.[0]?.text ?? ''; | |
| }); | |
| expect(emitted).toEqual([nineteen, nineteen]); | |
| }); |
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Partially adopted in commit d132acb.
Cap test (1024 bytes): added should not enter cumulative mode after the detection window cap is reached on a non-cumulative stream — sends 100 × 20-char incremental chunks (2000 bytes total, well past the cap) and asserts every chunk emits verbatim. This catches both an off-by-one in the cap comparison and any future regression that lets cumulative detection fire after the freeze.
Boundary tests (19/20 chars): these already exist (added during the previous review round, commit fc2492c):
should enter cumulative mode on exact 20-char repeat (at threshold)— 20-char × 2 → second emits ``, then prefix-extension emits suffix onlyshould pass through 19-char exact repeat without entering cumulative mode (below threshold)— 19-char × 2 passes both verbatim, then a prefix-extension at 25+ chars triggers cumulative entry
The 19/20 pair already provides off-by-one protection on >= versus >. Marking this thread as resolved unless you want a different framing.
| const parts = chunk.candidates?.[0]?.content?.parts; | ||
| expect(parts).toEqual([]); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Suggestion] No test covers the cumulative-mode rewind branch (emittedText.startsWith(rawDelta) where rawDelta is a strict prefix, shorter than emittedText). The existing "ignore repeated cumulative chunks" test covers only an exact-length repeat. A rewind scenario (e.g., provider re-sends a shorter accumulated chunk after a longer one) is untested:
Chunk 1: "Hello" → emits "Hello", emittedText = "Hello"
Chunk 2: "Hello World" → cumulative mode, emits " World"
Chunk 3: "Hello" → rewind branch (strict prefix), emits ""
Chunk 4: "Hello World!" → suffix extraction, emits "!"
This branch is the same one that lacks a debug trace (see the Critical finding on converter.ts:88). A test here would serve double duty as both coverage and a regression guard.
— glm-5.1 via Qwen Code /review
| const parts = chunk.candidates?.[0]?.content?.parts; | ||
| expect(parts).toEqual([]); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Suggestion] Consider adding a test with both reasoning_content and content in the same chunk delta. The current interleaved test alternates the two fields across separate chunks, but providers may emit both in a single delta. When both are present, two normalizeStreamingTextDelta calls execute on the same chunk against independent state objects.
— glm-5.1 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Independent re-review against the latest commit (fc2492c). Tests + typecheck + lint all clean locally; the iterative refinement (b2da918 / 136017f / fc2492c) has resolved the bulk of earlier critical concerns. Posting a few residual observations as inline comments — none are blockers in isolation, but #1 (prefix-overlap floor) is the one I'd most like to see addressed before merge.
| } | ||
|
|
||
| if ( | ||
| rawDelta.startsWith(state.emittedText) && |
There was a problem hiding this comment.
[Suggestion] Prefix-overlap entry still has no minimum-length floor.
The exact-repeat branch below (line 124) requires ≥20 chars before locking cumulative mode, but this branch fires on any prefix overlap of any length. The defense — "a standard OpenAI-protocol incremental provider only sends new tokens, so a chunk would never start with prior accumulated text" — is reasoning from spec, not from observed behavior across all OpenAI-compat backends.
If any provider ever batches tokens unevenly such that a newly batched chunk happens to begin with the prior chunk text (re-send of last token + extra, retry chunk, proxy retransmit, etc.), this branch silently flips the stream into cumulative mode and the very next chunk gets stripped to its suffix relative to a corrupted baseline. That's a hard-to-reproduce regression.
Adding a defensive floor — e.g. require state.emittedText.length >= 8 before allowing prefix-based entry — costs nothing on real cumulative streams (DashScope-class providers exhibit the pattern repeatedly across many chunks; one chunk later the threshold is met) but materially shrinks the false-positive window.
if (
state.emittedText.length >= CUMULATIVE_DELTA_PREFIX_MIN_BASELINE &&
rawDelta.startsWith(state.emittedText) &&
rawDelta.length > state.emittedText.length
) {
// enter cumulative mode
}| return suffix; | ||
| } | ||
|
|
||
| if (state.emittedText.startsWith(rawDelta)) { |
There was a problem hiding this comment.
[Suggestion] This silence path has no debugLogger.debug trace, unlike the other two cumulative-mode transitions.
The prefix-extension entry (line 117), exact-repeat entry (line 126), and cumulative-mode exit (line 100) all log. But this branch — "new chunk is a strict prefix of accumulated text, drop it" — silently returns ''. Network retransmits, proxy retries, and provider-side reconnect resends are exactly the scenarios where a developer needs to know whether this branch fired.
One line completes the observability story:
if (state.emittedText.startsWith(rawDelta)) {
debugLogger.debug(
`normalizeStreamingTextDelta: dropping shorter cumulative re-send (${rawDelta.length}b ≤ ${state.emittedText.length}b)`,
);
return '';
}| } | ||
|
|
||
| if ( | ||
| rawDelta.startsWith(state.emittedText) && |
There was a problem hiding this comment.
[Suggestion] After emittedText freezes at the 1024-byte cap, this prefix check (and the exact-repeat check on line 123) keep running against the stale baseline.
For a long incremental stream where some later chunk happens to begin with that exact frozen 1024-byte prefix, cumulative mode falsely engages mid-stream. Constructible scenarios: regenerated boilerplate code, repeated template rendering, agent retries.
Two defensive patterns, either is fine:
- Skip cumulative detection once the cap is reached and
cumulativeModeis still false — rationale: real cumulative providers exhibit the pattern from chunks 2–3, not after 1KB of clean incremental output. - Track a
frozenflag separately and gate the entry checks behind!frozen.
// Option 1 — minimal change
const capped = state.emittedText.length >= CUMULATIVE_DETECTION_WINDOW_BYTES;
if (
!capped &&
rawDelta.startsWith(state.emittedText) &&
rawDelta.length > state.emittedText.length
) {
// ...
}
if (!capped && rawDelta === state.emittedText) {
// ...
}| } | ||
|
|
||
| if (state.emittedText.length < CUMULATIVE_DETECTION_WINDOW_BYTES) { | ||
| state.emittedText += rawDelta; |
There was a problem hiding this comment.
[Suggestion] Test gap — no regression test for the 1024-byte cap itself.
A unit test that streams >1024 bytes of pure incremental chunks and asserts that subsequent prefix-overlap detection still produces the correct verbatim output would lock the cap behavior against accidental future tweaks. Equally, a boundary test at exactly 19/20 chars for CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH would pin that threshold.
These are cheap to add and would catch threshold-tweak regressions that the existing tests miss.
| // Some OpenAI-compatible providers send accumulated content in each | ||
| // delta.content field. Normalize that shape to incremental suffixes before the | ||
| // Gemini stream layer appends it to the live transcript. | ||
| function normalizeStreamingTextDelta( |
There was a problem hiding this comment.
[Suggestion / non-blocking] normalizeStreamingTextDelta + the two constants + the StreamingTextDeltaState type are self-contained and stateful. converter.ts is already 1300+ lines.
Moving this block to a new streamingDeltaNormalizer.ts (with its own focused unit tests) would:
- isolate the lifecycle contract — the "per-stream, must not be reused" warning becomes a module-level invariant rather than a JSDoc on two field names buried in
RequestContext; - shrink the surface of
converter.ts; - make the test file naturally narrower than the current 80-test omnibus.
Not blocking — fine to land as-is and refactor in a follow-up.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Reasoning channel lacks cumulative mode exit and exact-repeat entry tests. The content channel has dedicated tests (should exit cumulative mode, should resume prefix detection cleanly after exiting, should ignore repeated cumulative chunks) but the reasoning channel only has basic suffix normalization. While both share normalizeStreamingTextDelta, the integration points differ (separate state objects, separate push logic), so the reasoning channel's cumulative edge cases are untested.
[Suggestion] CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20 boundary untested. Existing tests cover well below ('ha', 2 chars) and well above (47 chars) but not 19 vs 20. A threshold change could alter behavior without test failures.
| return suffix; | ||
| } | ||
|
|
||
| if (state.emittedText.startsWith(rawDelta)) { |
There was a problem hiding this comment.
[Suggestion] Rewind branch returns '' without updating state, risking text duplication on provider backtrack+divergence.
In cumulative mode, when emittedText.startsWith(rawDelta) (provider sends a shorter prefix), this branch returns '' but leaves state.emittedText unchanged. If the next chunk diverges from the stale longer emittedText, cumulative mode exits and the new chunk is emitted verbatim — duplicating the previously-emitted suffix.
| if (state.emittedText.startsWith(rawDelta)) { | |
| if (state.emittedText.startsWith(rawDelta)) { | |
| debugLogger.debug( | |
| 'normalizeStreamingTextDelta: provider backtracked to shorter prefix, resetting baseline', | |
| ); | |
| state.cumulativeMode = false; | |
| state.emittedText = rawDelta; | |
| return ''; | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
This was already addressed in commit d132acb (pushed before this auto-review). The cumulative-rewind branch in converter.ts now emits a debugLogger.debug trace mirroring the other two suppression paths, so all three silent-suppression sites have parity:
- entry via prefix-overlap
- entry via exact-repeat
- rewind suppression —
cumulative rewind suppression (emitted=Nb, chunk=Mb)
Live source (line 97-99):
if (state.emittedText.startsWith(rawDelta)) {
debugLogger.debug(
`normalizeStreamingTextDelta: cumulative rewind suppression (emitted=${state.emittedText.length}b, chunk=${rawDelta.length}b)`,
);
return '';
}The duplication-on-divergence scenario you described is also covered by the existing rewind-then-extension test (should suppress cumulative rewind (provider re-sends shorter accumulated string)), which verifies that after a rewind the provider can resume extension without spurious duplicates. No code change needed.
| rawDelta.startsWith(state.emittedText) && | ||
| rawDelta.length > state.emittedText.length | ||
| ) { | ||
| const prevLen = state.emittedText.length; |
There was a problem hiding this comment.
[Suggestion] Prefix-overlap heuristic has no minimum-length floor for entering cumulative mode, making it vulnerable to false-positive on short coincidental prefix extensions from a non-cumulative provider via a reassembling proxy.
The exact-repeat branch below (line 124) requires ≥20 chars before locking cumulative mode, but this prefix-overlap branch fires on any prefix extension regardless of length. Consider adding a minimum emittedText length threshold (e.g., 50 chars) before allowing prefix-overlap entry, reducing false-positive probability while still detecting genuine cumulative providers that emit full sentences.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Re-examining this one in the post-d132acb97 state.
You're right that the prefix-overlap entry path has no min-length floor, in contrast to the exact-repeat branch which requires >= CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH (20). I want to keep the asymmetry rather than add a floor here, and surface the reasoning so a future maintainer can revisit:
Why the two branches are intentionally asymmetric:
- Exact-repeat (
rawDelta === emittedText) is high false-positive risk on its own — a non-cumulative provider that repeats a 2-char word like"ha"or a punctuation token is the most common collision pattern. Hence the 20-char floor. - Prefix-overlap (
rawDelta.startsWith(emittedText) && rawDelta.length > emittedText.length) requires the new chunk to literally start with the entire previously-emitted text. For a non-cumulative provider whose chunks are independent tokens, this collision needs the new chunk to coincidentally reproduce the full transcript so far as its prefix — which is exponentially less likely as the transcript grows.
What's protecting it instead:
CUMULATIVE_DETECTION_WINDOW_BYTES = 1024(added in fc2492c) — once 1024 bytes of non-cumulative content have accumulated, the baseline freezes and the prefix-overlap check early-returns before evaluating. So the collision window for accidental prefix lock-in is bounded to the first 1024 bytes of a stream.- The proxy-reassembly scenario you describe could in principle land inside that window. The current cost when it does: one false positive locks the stream into cumulative mode, and the next chunk that does not prefix-extend triggers
cumulative-mode exit(b2da918) — which resets the baseline to that chunk and emits it verbatim. So the worst-case observable damage is: one suppressed chunk between the false-positive entry and the next non-prefix-extension chunk.
Bottom line / forward pointer: if we ever see a real-world report of a reassembling proxy poisoning early prefix detection, the next step is to add a small min-length floor on prefix-overlap entry (e.g. prevLen >= 8) symmetric with the exact-repeat floor, rather than dropping prefix detection entirely. Until then, the window cap + exit-and-recover path keep the blast radius bounded.
Deferring without a code change — happy to revisit if a concrete failing trace shows up.
| requestContext, | ||
| Boolean(choice.finish_reason), | ||
| ), | ||
| const normalizedContent = normalizeStreamingTextDelta( |
There was a problem hiding this comment.
[Suggestion] Missing test: empty normalizedContent (returned '') combined with finish_reason: 'stop'. All 8 new cumulative tests use finish_reason: null, so the normalizedContent || choice.finish_reason guard's empty-string path is untested in a cumulative context. This path matters because it flushes buffered tagged-thinking content on stream end even when the delta itself is silenced.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Addressed in commit c4ac09f with a new test: should still call into convertOpenAITextToParts on finish_reason when the cumulative-mode normalized delta is empty.
The test:
- Sends a 2-chunk prefix-extension pair to establish cumulative mode and prime
emittedText. - Sends a final chunk that exact-repeats the accumulated string with
finish_reason: 'stop'. The cumulative branch normalizes the delta to'', exercising thenormalizedContent || choice.finish_reasonguard's empty-string + non-null-finish_reason path. - Asserts
finishReason === 'STOP'propagates and no spurious text part is emitted.
This guards the path you flagged: empty normalized delta in cumulative mode must still drive convertOpenAITextToParts on stream end so any buffered tagged-thinking content gets flushed. Test count is now 98 (was 97); local run is green.
| expect(emitted[2]).toBe(' there, how are you today?'); | ||
| }); | ||
|
|
||
| it('should normalize cumulative reasoning_content deltas to suffixes', () => { |
There was a problem hiding this comment.
[Suggestion] Duplicate test name — nearly identical to the test at line ~2006 ('should normalize cumulative streaming reasoning_content deltas to suffixes' vs 'should normalize cumulative reasoning_content deltas to suffixes'). The names differ by only the word streaming, making it unclear what each uniquely covers. Consider renaming the second test to describe its distinct scenario (e.g., multi-line step-by-step reasoning).
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Addressed in commit c4ac09f. Renamed the test at line 2162 to clarify its distinct scenario:
should normalize cumulative reasoning_content deltas across multi-line growth (newline-prefixed suffixes)
The differentiator vs. the test at line 2006 (should normalize cumulative streaming reasoning_content deltas to suffixes) is now explicit in the title and reinforced by an in-test comment: the multi-line variant grows the accumulated text across \n boundaries so the emitted suffixes themselves begin with \n, exercising the slice arithmetic at the newline. The single-line variant covers the basic intra-sentence prefix extension. Both are kept since they exercise non-overlapping arithmetic edge cases on the same channel.
…n window cap Adds reasoning-channel coverage requested in the most recent review of the cumulative-delta normalization fix: - exact-repeat entry on `reasoning_content` (mirrors the content-channel `should ignore repeated cumulative chunks` test) - cumulative-mode exit on `reasoning_content` when a chunk does not match the prior accumulated text (mirrors `should exit cumulative mode`) - prefix-detection re-entry on `reasoning_content` after the exit path resets the baseline (mirrors `should resume prefix detection cleanly`) Also lands the previously-staged converter.ts hardening from the 2026-05-08 review round and adds the missing detection-window cap test: - emit `debugLogger.debug` trace on the cumulative-rewind suppression branch so the third silent-suppression path has parity with the other two (was the only suppression path with no observability) - early-return when the non-cumulative `emittedText` baseline is frozen at `CUMULATIVE_DETECTION_WINDOW_BYTES` (1024); prevents prefix/exact-repeat checks from running against a stale baseline once the cap is reached - regression test: 2000 chars of incremental chunks past the 1024 cap all pass through verbatim (the cap holds; no late misclassification) Reasoning and content channels share `normalizeStreamingTextDelta` but maintain separate state objects, so the integration points differ; these tests guard against future refactors that accidentally couple them or break the per-channel exit/re-entry behavior. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Addressed in commit d132acb. Reasoning channel cumulative-mode mirror tests (your suggestion 1): Three new tests on the reasoning channel mirroring the content channel coverage you flagged:
Although both channels share 20-char threshold boundary (your suggestion 2): Already covered by the existing tests added in commit fc2492c:
This pair pins the Bonus — also picked up two stragglers from the 2026-05-08 review round in the same commit:
Tests + typecheck + lint clean locally ( |
wenshao
left a comment
There was a problem hiding this comment.
All findings from this review overlap with existing unresolved inline comments (converter.ts:96, converter.test.ts:2162). No new comments to add. — deepseek-v4-pro via Qwen Code /review
|
三处再处理一下就可以 approve:
其余 thread 已经在前几轮 commit 里 "Adopted/Fixed in commit X" 落地,麻烦顺手 Resolve 一下。 |
…icate Two follow-ups from wenshao's 2026-05-09 review: 1. Add coverage for the `normalizedContent || choice.finish_reason` guard on the content path in cumulative mode. All prior cumulative tests use `finish_reason: null`, so the empty-normalized + non-null finish_reason path was untested. Establishes cumulative mode, then sends an exact- repeat final chunk with `finish_reason: 'stop'`; asserts the empty normalized delta does not emit spurious text but `finishReason` still propagates through to the candidate. 2. Rename the second cumulative-reasoning_content test (line 2162) to clarify its distinct scenario — multi-line accumulation where the emitted suffix itself begins with a newline. The previous name was confusingly similar to the test at line 2006 (single-line cumulative reasoning), making it unclear what each uniquely covered. 97 → 98 tests; lint clean. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Re: the 2026-05-09T17:09 review ("All findings from this review overlap with existing unresolved inline comments (converter.ts:96, converter.test.ts:2162). No new comments to add."): Both anchors have now been resolved across two commits: converter.ts:96 (rewind branch trace + downstream concerns at converter.ts:88, :111, :122) — resolved in d132acb (debugLogger trace on the rewind branch + detection-window cap early-return). converter.test.ts:2162 (duplicate test name + missing boundary coverage) — initial boundary coverage landed in d132acb; the rename and the empty-cumulative + finish_reason coverage landed in c4ac09f just now. Inline reply trail for every thread cited by the auto-review is now updated; nothing is sitting unanswered. Local verification ( Lint also clean. Ready for re-review — please dismiss the CHANGES_REQUESTED token if the resolutions above land cleanly on your end. |
|
All @wenshao's review feedback addressed across d132acb + c4ac09f: reasoning channel cumulative-mode coverage (3 mirror tests), CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH=20 boundary at 19 vs 20, 1024-byte window cap protection test, empty cumulative + finish_reason path, duplicate test name renamed. 98/98 converter tests passing, CI all green, mergeable. Could you take another look? |
| state.cumulativeMode = false; | ||
| // Reset baseline to current chunk so future prefix checks use fresh state. | ||
| state.emittedText = rawDelta; | ||
| return rawDelta; |
There was a problem hiding this comment.
[Critical] The CUMULATIVE_DETECTION_WINDOW_BYTES early-return guard fires before the prefix-extension and exact-repeat checks. If a cumulative provider sends a first chunk larger than 1024 chars, emittedText immediately exceeds the cap. On the second chunk the guard returns rawDelta verbatim, permanently disabling cumulative detection for the entire stream and causing massive text duplication.
| return rawDelta; | |
| // Move the cap guard (lines 109-113) so it gates only the fallback concatenation | |
| // path (line 135-138), not the prefix-extension/exact-repeat detection. The cap | |
| // should prevent emittedText growth in non-cumulative fallback, not gate | |
| // cumulative detection itself. |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Confirmed and fixed in a21ac59.
Walked the control flow with a 1500-char first cumulative chunk:
- Chunk 1 (1500 chars):
emittedText.length === 0→ first-chunk path setsemittedText = rawDelta(1500 chars), returns chunk. - Chunk 2 (1700 chars, cumulative):
cumulativeMode = false,emittedText.length (1500) >= 1024→ early-return guard fires → returns chunk verbatim. First 1500 chars duplicated. Bug confirmed.
The window cap's original goal (bounding per-stream memory for non-cumulative streams) is already enforced by the concat guard at the fallback path (emittedText += rawDelta only runs below the cap). The early-return at the detection path was redundant defense-in-depth and harmful to real cumulative streams with large initial chunks.
Fix: dropped the early-return guard on the detection path. Prefix-overlap and exact-repeat checks now always run; only the fallback concat path is gated by the cap. Added a regression test (should detect cumulative mode even when the first chunk exceeds the detection window cap) with 1500/1700/1750-char chunks. The existing "cap is reached on a non-cumulative stream" test still passes because incremental chunks (20 chars) never satisfy the rawDelta.length > emittedText.length (1024) strict-longer check anyway.
| rawDelta.startsWith(state.emittedText) && | ||
| rawDelta.length > state.emittedText.length | ||
| ) { | ||
| const prevLen = state.emittedText.length; |
There was a problem hiding this comment.
[Suggestion] Swap the conditions so the cheaper length check runs first. When rawDelta.length <= state.emittedText.length, startsWith is always false but incurs call overhead.
| const prevLen = state.emittedText.length; | |
| if ( | |
| rawDelta.length > state.emittedText.length && | |
| rawDelta.startsWith(state.emittedText) | |
| ) { |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Applied in a21ac59. The conditions are now ordered length-first so startsWith is skipped when rawDelta cannot possibly be a strict extension.
… cap Address wenshao's [Critical] review on PR #3896: the CUMULATIVE_DETECTION_WINDOW_BYTES early-return guard was firing before the prefix-overlap and exact-repeat checks, so cumulative providers whose first chunk exceeds the 1024-byte cap had detection permanently disabled, causing the entire first chunk to be duplicated on the second chunk. The cap's original goal — bounding per-stream memory for non-cumulative streams — is already enforced by the concat guard at the fallback path (`emittedText += rawDelta` runs only below the cap). The early-return guard on the detection path was redundant defense-in-depth and harmful to real cumulative streams with large initial chunks. Also applies the [Suggestion] from the same review pass: swap the prefix-overlap conditions so the cheap length check runs before `startsWith`, avoiding the call when `rawDelta` cannot be a strict extension. Adds a regression test covering the cumulative-first-chunk-large case (1500/1700/1750-char chunks) to prevent re-introduction. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
| expect(emitted[2]).toBe(' extra'); | ||
| }); | ||
|
|
||
| it('should not enter cumulative mode after the detection window cap is reached on a non-cumulative stream', () => { |
There was a problem hiding this comment.
[Suggestion] 测试名称/注释与实际测试内容不符。测试声称验证"以冻结的 1024 字符基线开头的后续 chunk 不会被误分类为累积模式",但测试数据(100 个互不相同的增量 chunk,如 chunk000-payload__)中没有任何 chunk 以之前累积的文本开头。它实际只验证了一个较弱的断言:超过上限后普通增量 chunk 能正确透传,并未验证前缀检查与冻结 emittedText 之间的交互行为。
建议扩展测试数据,在 1024 字符边界之后添加一个恰好以冻结前缀开头的新文本 chunk,验证其被逐字透传而不误入累积模式。
| it('should not enter cumulative mode after the detection window cap is reached on a non-cumulative stream', () => { | |
| // 在 1024 字符边界之后的 chunk 列表中,添加一条以冻结基线开头的新 chunk: | |
| // 例如在前 51 个 chunk 累积到 1020+ 字符后,第 52 个 chunk 以已累积文本开头 | |
| // const chunks = [...]; chunks.push(`chunk051-payload__extra_content_that_starts_with_frozen_prefix`); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Verdict: legitimate test-quality issue, confirmed. Addressed in commit cdc7b6b.
You're right on both counts:
- The fixture (100 distinct
chunk###-payload__chunks) has no internal prefix overlap, so prefix/exact-repeat detection never fires regardless of the cap — the comment promised a "trailing chunk that ... starts with what would be the frozen 1024-char baseline" but no such chunk exists. - Investigating this revealed something more interesting: the cap is not an early-return guard against misclassification. It only bounds
emittedTextmemory growth. A trailing chunk that genuinely overlaps with the frozen 1026-byte baseline would still enter cumulative mode via the prefix-overlap branch at lines 112–124 (verified by simulation:state.cumulativeMode = true, suffix-only emission). So the original test name's stronger claim wasn't backed by the implementation either.
Fix taken: renamed to should pass incremental chunks through verbatim past the detection window cap (none of them overlap) and rewrote the comment to (a) describe what the fixture actually exercises (incremental passthrough past the cap, guards against future regressions that might break the passthrough branch once emittedText.length >= 1024) and (b) explicitly call out the case it does NOT cover (a ≥1024-byte exact-prefix coincidence on a true incremental stream — vanishingly unlikely in practice but undefended). Did not add a "counterexample" test that would assert the misclassification-after-cap behavior, since that would either lock in current (arguably incorrect) behavior or require a real defense-in-depth change that's out of scope for this PR.
All 99 tests in converter.test.ts still pass.
The test previously claimed to verify that a trailing chunk arriving after the cap "would have" started with the frozen 1024-byte baseline, but no such chunk existed in the fixture. The 100 distinct incremental chunks never overlapped each other, so prefix/exact-repeat detection never fired regardless of the cap. Rename and re-comment the test to reflect the scenario it actually exercises (incremental passthrough past the cap) and explicitly call out the case it does NOT cover. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
All feedback through round 5 addressed and pushed:
99/99 converter tests passing, CI green, mergeable. Ready for re-review. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
Code ReviewOverviewThis PR addresses a long-standing class of duplicate-streaming bugs (#3279, #1184, #3838, #3806) where certain OpenAI-compatible upstreams (notably DashScope / 阿里云百炼 Coding Plan) ship The fix introduces Surface area: 3 files, +914 / −8 (786 of the additions are tests). Strengths
Concerns / suggestions1. (Minor) Frozen baseline can still match a coincidental 1024-byte prefixAfter
In practice the probability of a 1024-byte exact-prefix coincidence on a real incremental stream is vanishingly small, and the acknowledgement is fair. Worth surfacing in code (not just the test comment): a one-line comment near the cap-check or the prefix-overlap branch would help future maintainers spot the trade-off. Not a blocker. 2. (Minor) O(N) prefix check on potentially large baselinesIn cumulative mode there is no cap on 3. (Style) Magic numbers are well-commented but worth surfacing
4. (Minor) Behavior when cumulative mode exits onto unrelated contentThe exit path resets Risk assessment
VerdictApprove. Targeted, well-scoped fix for a real high-impact bug class. Test coverage is unusually thorough — most concerns above are documentation/perf-margin notes rather than correctness issues, and the PR has clearly already absorbed multiple rounds of careful review (5 review-driven commits visible in the history). Ship it. |
wenshao
left a comment
There was a problem hiding this comment.
Did an end-to-end local verification of this PR (HEAD cdc7b6b0e) — built core, then ran four verification layers: unit tests, scripted algorithm replay, corner-case probes, and a side-by-side comparison against a simulated main to confirm the duplication is actually eliminated.
Verified working ✓
- All 99 tests pass (98 claimed + 1 I missed in the count).
- Core fix is end-to-end effective. I simulated a 9-chunk DashScope-style cumulative stream of a 304-byte Markdown table and ran it through both
main(chunks appended verbatim) and this PR's converter:- main: user-visible output 1578 bytes (5.2× amplification), table header
| Syntax | Meaning |appears 8×⚠️ - PR: user-visible output 304 bytes (1.0× amplification), table header appears 1× ✓
- PR output is byte-equal to the original final text. Direction matches the table-sentinel evidence in your description (14× → 1×).
- main: user-visible output 1578 bytes (5.2× amplification), table header
- Normal incremental streams pass through unchanged — verified by feeding a 13-chunk incremental stream of the same content; output reconstructs the original exactly.
- Channel separation works — interleaved
content+reasoning_contentchunks each maintain independent state and don't cross-contaminate.
The fundamental algorithm is sound. However, two corner cases of the heuristic produce real user-visible regressions and need to be addressed before merge.
Blocking findings
1. CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 20 causes silent data loss on legitimate repeats
The 20-char threshold is too low. Any provider that legitimately emits a chunk equal to a prior chunk of ≥20 characters will have the second chunk silently suppressed:
chunk 1: "import { foo } from './module';" // 31 chars
chunk 2: "import { foo } from './module';" // legitimate repeat (e.g. duplicate import)
// PR output: chunk 1 = full text, chunk 2 = "" ← user loses the second lineI tested boundaries on the converter directly:
| chunk length | exact repeat → emitted on 2nd chunk |
|---|---|
19 chars "X".repeat(19) |
"XXXXXXXXXXXXXXXXXXX" ✓ preserved |
20 chars "X".repeat(20) |
"" |
31 chars "import { foo } from './module';" |
"" |
Realistic triggers: repeated import lines in code generation, repeated emoji sequences, repeated boilerplate (e.g. </div>\n</div>), or duplicated short paragraphs.
This is silent lossy behavior — there's no UI warning, no debug log on the user-facing path. The earlier review thread mentioned ≥50 bytes; please raise to at least 50, probably 80. The cumulative-detection signal it's trying to catch (DashScope replaying a long accumulated buffer) is virtually always hundreds of bytes, so a higher threshold doesn't weaken the catch but dramatically reduces the false-positive surface.
2. The CUMULATIVE_DETECTION_WINDOW_BYTES = 1024 cap can produce visible duplication, not just missed detection
I initially assumed the cap would just disable detection past 1024 B. The actual behavior is worse — when a normal-incremental stream exceeds 1024 B and the upstream then switches to cumulative mode, detection mis-fires and the emitted suffix overlaps content the user has already seen.
Reproduction: 200 incremental chunks (1600 B total accumulated) followed by a single cumulative chunk:
incremental phase: user sees 1600 bytes
internal emittedText capped at 1024 bytes
cumulative chunk arrives (1614 bytes = 1600 accumulated + "|CONTINUATION|")
algorithm: chunk.startsWith(emittedText[:1024]) → true → enter cumulative
emit suffix = chunk[1024:] = ~590 bytes, including bytes 1024-1600 again
user-visible result: 1600 + 590 = ~576 bytes of duplication
Not the targeted use case (DashScope is cumulative from the start), but any "incremental-then-cumulative" hybrid provider will produce visible duplication of about accumulated_length - cap bytes. Fix: track the total emitted length (just the integer, not the string content) separately from the capped baseline, and use it for the suffix slice when entering cumulative mode. Costs ~8 bytes per request.
Non-blocking observations
3. Diverge-with-overlap can produce visible duplication on cumulative exit
Tested:
chunks: ["ABCDEFGHIJ", "ABCDEFGHIJKL", "ABCDEFGHIJKLMN", "GHIJKLMN_NEW"]
// ── chunk 4 doesn't startsWith and isn't startedBy emittedText
// ── exit cumulative, emit chunk 4 verbatim
user-visible: "ABCDEFGHIJKLMN" + "GHIJKLMN_NEW"
// → "GHIJKLMN" appears 2× in user outputDashScope-class cumulative providers don't emit half-overlapping chunks mid-stream, so this doesn't bite the targeted bug, but it's worth documenting in normalizeStreamingTextDelta's JSDoc that "exit cumulative" is a verbatim-emit path with no overlap reconciliation, and the algorithm assumes the diverged chunk is fully fresh content.
4. state.emittedText grows linearly with stream length once cumulative mode is entered
Confirmed: 5 cumulative chunks of growing length → emittedText.length === 10125 bytes. Each successful match does state.emittedText = rawDelta, so for a 100 KB final response the buffer holds the full 100 KB. Single-request scoped, so it's bounded by request lifetime, but worth a JSDoc note: "in cumulative mode state.emittedText retains the full accumulated text until request completion." A future optimization (only retain the last N bytes once we're confident in cumulative mode) is possible but not required now.
Recommendation
The core fix is sound and the targeted DashScope cumulative-duplication bug is verifiably eliminated end-to-end. #1 (raise the exact-repeat threshold to ≥50) is a one-line change and prevents silent data loss in legitimate-repeat scenarios. #2 (track emitted length separately from capped baseline) is a small change that closes the hybrid-provider regression. With those two addressed I'll re-approve.
Verification scripts saved locally (/tmp/verify-pr3896-{L2,L2_5,L2_5b,L3}.mjs) — happy to share for the follow-up commit.
…ength Addresses two e2e-verified findings from wenshao's PR #3896 review (2026-05-13 CHANGES_REQUESTED) where local replay surfaced real user-visible regressions in the cumulative-delta heuristic. 1. Bump CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH from 20 to 64. Any legitimately repeated chunk of 20+ chars (e.g. duplicate import lines, repeated short paragraphs, repeated emoji sequences) was silently suppressed by the prior threshold. Cumulative-buffer replays are virtually always hundreds of bytes, so the higher threshold preserves catch-rate while eliminating the silent-loss surface. 2. Track the true user-visible emitted length separately from the 1024-byte capped baseline. For incremental-then-cumulative hybrid streams (200 incremental chunks totalling 1600 bytes followed by a cumulative chunk), the prior implementation sliced the suffix from byte 1024 of the cumulative chunk and re-emitted ~576 bytes the user had already seen. The new emittedLength field captures the true total so the slice starts at the user-visible boundary; the historical short-repeat-then-extend behaviour is preserved by gating the emittedLength-based slice on the baseline actually having frozen at the cap. Also documents the per-stream/per-channel state lifecycle and the "exit cumulative" verbatim-emit semantics on normalizeStreamingTextDelta. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Thanks for the end-to-end verification @wenshao — the simulated DashScope replay and the 1578B→304B amplification numbers are exactly the kind of evidence that made both findings actionable. Pushed #1 — exact-repeat threshold (blocking → fixed)Raised #2 — slice from real emitted total, not capped baseline (blocking → fixed)Added Regression test mirrors your reproducer: 200 distinct 8-byte incremental chunks (1600B accumulated, well past the 1024B cap) followed by #3 — diverge-with-overlap on cumulative exit (non-blocking → JSDoc)Documented on #4 —
|
wenshao
left a comment
There was a problem hiding this comment.
Re-verified d1cdc1e6f end-to-end:
- 101/101 converter tests pass (3 new regression tests landed)
- #1: threshold raised 20 → 64; 31-char duplicate import line preserved verbatim ✓
- #2: 1614B cumulative-after-1600B-incremental now emits exactly 14B (
|CONTINUATION|) — the ~576B replay is gone ✓ - #3 / #4: JSDoc additions match the verbatim-exit and linear-growth semantics
- End-to-end main vs PR: 5.2× → 1.0× amplification still holds; PR output byte-equal to original
Both blocking findings cleanly resolved. LGTM.
…nLM#3896) [upstream cherry-pick] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Some OpenAI-compatible upstreams (notably specific DashScope / 阿里云百炼 Coding Plan paths) send `delta.content` as accumulated full text instead of incremental suffixes. The Gemini stream pipeline appended every chunk verbatim, so the same content was concatenated repeatedly during streaming. Adds a per-stream `normalizeStreamingTextDelta` that detects cumulative-mode chunks (current chunk starts with previously emitted text) and emits only the suffix. Once cumulative mode is locked in, exact-repeat (≥64 chars) and prefix-overlap chunks are silenced. Normal incremental streams flow through unchanged. The fix applies to both `delta.content` and `delta.reasoning_content` / `delta.reasoning`, with separate state slots so a switch between text and reasoning channels does not cause false-positive suppression. A separate `emittedLength` counter tracks user-visible bytes so an incremental-then-cumulative hybrid stream slices the correct suffix even after the 1024-byte detection-window cap kicks in. Fixes #3279, #1184, #3838, #3806.
Summary
Some OpenAI-compatible upstreams (notably specific DashScope / 阿里云百炼 Coding Plan paths) send
delta.contentas accumulated full text instead of incremental suffixes. The Gemini stream pipeline appends every chunk verbatim, so the same content gets concatenated repeatedly during streaming — visible as Markdown / table content duplicating itself many times before the final chunk lands.This PR adds a per-stream
normalizeStreamingTextDeltathat detects cumulative-mode chunks (current chunk starts with previously emitted text) and emits only the suffix. Once cumulative mode is locked in, exact-repeat and prefix-overlap chunks are silenced. Normal incremental streams flow through unchanged.The fix applies to both
delta.contentanddelta.reasoning_content/delta.reasoning, with separate state slots so a switch between text and reasoning channels does not cause false-positive suppression.Why split out
Extracted from #3663 (umbrella TUI flicker / streaming stability PR — ~30 files / ~4300 LoC across UI bounding, scrollback artifact suppression, capture infra, etc.). This single-purpose PR isolates the provider-side normalization layer because:
packages/coreonly — zeroink/reactcoupling, unaffected by the in-flight ink 7 upgrade.8 → 112(amplification 14×). This PR alone drops that ratio back to 1.0×.The umbrella will be split into a series of small, independently demonstrable PRs; this is the first.
Issues this addresses
Family / closed but related context: #2402 (CLOSED, OpenRouter duplicate
finish_reasonchunks — different fix, same family of OpenAI-compat upstream quirks).Test plan
cd packages/core && npx vitest run src/core/openaiContentGenerator/converter.test.ts— 83 / 83 pass, including 3 new cases:should normalize cumulative streaming content deltas to suffixes— happy pathshould ignore repeated cumulative chunks with no new suffix— exact-repeat lock-inshould preserve repeated short incremental content chunks— false-positive guardnpm run typecheck --workspace=@qwen-code/qwen-code-core— cleannpm run lint --workspace=@qwen-code/qwen-code-core— cleanLive reproduction GIF
the following gif shows the duplicate output when the
delta.contentoutput the full content.Reproduction
On
mainwith a known-cumulative provider: streaming a long Markdown table prints the table contents N+(N-1)+(N-2)+… times during the live frame loop.On this branch: the same stream prints the contents exactly once, regardless of how many cumulative chunks the upstream emits.
For deterministic synthetic reproduction, the capture harness in #3663 (
integration-tests/terminal-capture/streaming-clear-storm.ts, payloadmarkdown-table-cumulative) is the canonical script — left out of this PR to keep the diff minimal but planned as a follow-up split.Follow-ups
Next planned splits from #3663 (each independently demonstrable, non-conflicting with the ink 7 upgrade):
fix(cli): tmux-safe dots spinner— addresses When performing tasks in the cmux terminal, the output content flashes #3330 family (spinner redraw pressure inside tmux multiplexers)fix(core): throttle shell tool live text updates— defensive perf for high-frequency shell command output