Skip to content

fix(pipeline): handle duplicate finish_reason chunks from OpenRouter#2403

Merged
tanzhenxin merged 1 commit into
QwenLM:mainfrom
simon100500:fix/duplicate-finish-chunk-tool-calls
Mar 18, 2026
Merged

fix(pipeline): handle duplicate finish_reason chunks from OpenRouter#2403
tanzhenxin merged 1 commit into
QwenLM:mainfrom
simon100500:fix/duplicate-finish-chunk-tool-calls

Conversation

@simon100500

Copy link
Copy Markdown
Contributor

Fixes #2402

Problem

Some OpenRouter model providers (e.g. google/gemini-3.1-flash-lite-preview) send two consecutive SSE chunks with finish_reason: "tool_calls". The second chunk arrives after streamingToolCallParser.reset() has already been called, so it carries empty parts — no functionCall entries.

handleChunkMerging treated every finish chunk as authoritative and overwrote pendingFinishResponse with the empty duplicate, discarding the functionCall parts correctly assembled from the first finish chunk.

This caused processStreamResponse to see hasToolCall=false and throw:

Model stream ended with empty response text.

Fix

In handleChunkMerging: when a second finish chunk arrives and a pendingFinishResponse already exists, only merge usageMetadata (if present) and keep the candidates from the first finish chunk.

if (isFinishChunk) {
  if (hasPendingFinish) {
    // Duplicate finish chunk — keep candidates from first, merge only metadata
    const lastResponse = collectedGeminiResponses[...];
    if (response.usageMetadata) lastResponse.usageMetadata = response.usageMetadata;
    setPendingFinish(lastResponse);
  } else {
    collectedGeminiResponses.push(response);
    setPendingFinish(response);
  }
  return false;
}

Testing

The existing pipeline.test.ts suite should cover regressions. A new test case can be added for the duplicate-finish-chunk scenario if desired.

Some OpenRouter model providers (e.g. google/gemini-3.1-flash-lite-preview)
send two consecutive SSE chunks with finish_reason='tool_calls'. The second
chunk arrives after streamingToolCallParser.reset() has been called, so it
carries empty parts — no functionCall entries.

The original handleChunkMerging treated every finish chunk as authoritative
and overwrote pendingFinishResponse, discarding the functionCall parts that
were correctly assembled from the first finish chunk.

Fix: when a second finish chunk arrives and a pendingFinishResponse already
exists, only merge usageMetadata (if present) and keep the candidates from
the first finish chunk.
@Mingholy Mingholy added the scope/content-generation AI content generation label Mar 16, 2026
@Mingholy

Copy link
Copy Markdown
Collaborator

Thanks for the contribution!
This is a core change, and has some conflicts with #2404. I'm merging them into a single test branch to validate. This may take some time and will be merged after the validation!

@Mingholy Mingholy self-assigned this Mar 16, 2026
@tanzhenxin tanzhenxin linked an issue Mar 18, 2026 that may be closed by this pull request
@tanzhenxin tanzhenxin merged commit a60fadd into QwenLM:main Mar 18, 2026
15 checks passed
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
…chunk-tool-calls

fix(pipeline): handle duplicate finish_reason chunks from OpenRouter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope/content-generation AI content generation

Projects

None yet

3 participants