Skip to content

fix: prevent tool call loss from late-arriving names and duplicate finish chunks#2404

Open
drewd789 wants to merge 2 commits into
QwenLM:mainfrom
drewd789:fix/tool-call-name-after-json
Open

fix: prevent tool call loss from late-arriving names and duplicate finish chunks#2404
drewd789 wants to merge 2 commits into
QwenLM:mainfrom
drewd789:fix/tool-call-name-after-json

Conversation

@drewd789

Copy link
Copy Markdown
Contributor

Summary

Fixes three complementary bugs that cause tool calls to disappear during streaming:

  1. Parser bug: Function name arriving after JSON completion causes index reassignment
  2. Converter bug: Waiting until finish_reason to emit allows parser state changes to lose tool calls
  3. Pipeline bug: Duplicate finish_reason chunks cause second chunk to overwrite first chunk's tool calls

Root Causes

Bug 1: Parser Index Reassignment (streamingToolCallParser.ts)

findMostRecentIncompleteIndex() and findNextAvailableIndex() considered tool calls "complete" based on JSON state alone (depth=0, non-empty buffer, has ID), without checking if meta.name exists. When name arrived in a later chunk, parser reassigned it to a new index, leaving JSON at the old index → getCompletedToolCalls() returned empty.

Bug 2: Late Emission (converter.ts)

Tool calls were only emitted at finish_reason, not during streaming. This allowed parser state changes between JSON completion and name arrival to cause tool call loss.

Bug 3: Finish Chunk Overwriting (pipeline.ts)

handleChunkMerging() unconditionally stored every finish chunk. Some providers send finish_reason twice (first with tool calls, second with usage metadata only). Second chunk overwrote the first → tool calls lost.

Changes

streamingToolCallParser.ts

  • Check meta.name when determining if tool call is complete in findMostRecentIncompleteIndex()
  • Check meta.name when determining if index is available in findNextAvailableIndex()
  • Prevents index reassignment when name arrives late

converter.ts

  • Emit tool calls immediately during streaming when result.complete && result.value && meta.name
  • Track emitted tool call IDs in emittedToolCallIds Set
  • Skip already-emitted IDs at finish_reason to prevent duplicates
  • Clear emittedToolCallIds in resetStreamingToolCalls()

pipeline.ts

  • Skip finish chunks without tool calls when pending finish already has tool calls
  • Preserves first finish chunk's tool calls from being overwritten

Testing

Unit tests added for all three fixes:

Parser tests:

  • should keep name at correct index when it arrives after JSON
  • should not reassign index when name arrives after JSON
  • should handle multiple tool calls with late names

Converter tests:

  • should emit tool call immediately when JSON completes during streaming
  • should not emit duplicate tool calls at finish_reason
  • should handle multiple finish_reason chunks without duplicates

Pipeline tests:

  • should skip finish chunk without tool calls when pending finish has tool calls

All tests pass, confirming the fixes work correctly.

Related

These fixes work together for defense-in-depth:

  • Parser fix prevents index corruption
  • Converter fix emits early + prevents duplicates
  • Pipeline fix prevents finish chunk overwriting

Complements existing streaming handling. Works with providers that send chunks in non-standard order or send finish_reason multiple times.

@Mingholy

Copy link
Copy Markdown
Collaborator

Thanks for the contribution!
The fix is solid and the newly added test cases explained the root cause well. Further changes may be submitted directly to this PR to resolve merge conflicts after we've done running some validations.

@tanzhenxin

Copy link
Copy Markdown
Collaborator

@drewd789 Thanks for your contribution!

Regarding the issue #2402 , we favor PR #2403 as it is a targeted fix and we can easily verify. Though we agree that the changes in this implementation is more complete with better test coverage.

As to other changes, we need an issue so that we can reproduce it to verify. Again, we are hesitate to take any PR targeting core module without a reproducible issue, for stability consideration.

Thanks for your understanding.

@tanzhenxin tanzhenxin added the status/need-information More information is needed to resolve this issue. label Mar 18, 2026
Tests for all three tool call loss fixes:

**streamingToolCallParser.ts:**
- should keep name at correct index when it arrives after JSON
- should not reassign index when name arrives after JSON
- should handle multiple tool calls with late names

**converter.ts:**
- should emit tool call immediately when JSON completes during streaming
- should not emit duplicate tool calls at finish_reason
- should handle multiple finish_reason chunks without duplicates

**pipeline.ts:**
- should skip finish chunk without tool calls when pending finish has tool calls

These tests verify the fixes for:
1. Parser index reassignment when name arrives late
2. Late emission allowing parser state changes to lose tool calls
3. Finish chunk overwriting when API sends finish_reason twice
…nish chunks

Fixes three complementary bugs that cause tool calls to disappear during streaming:

**streamingToolCallParser.ts:**
- Check meta.name when determining if tool call is complete
- Prevents index reassignment when name arrives after JSON completion

**converter.ts:**
- Emit tool calls immediately during streaming when complete + has name
- Track emitted tool call IDs in emittedToolCallIds Set
- Skip already-emitted IDs at finish_reason to prevent duplicates
- Clear emittedToolCallIds in resetStreamingToolCalls()

**pipeline.ts:**
- Skip finish chunks without tool calls when pending finish already has them
- Prevents second finish chunk from overwriting first chunk's tool calls

Root causes:
1. Parser considered tool calls "complete" without checking for meta.name
2. Converter only emitted at finish_reason, allowing parser state changes to lose tool calls
3. Pipeline unconditionally stored every finish chunk, overwriting previous ones
@drewd789 drewd789 force-pushed the fix/tool-call-name-after-json branch from 55dd9ac to 59048af Compare April 18, 2026 03:30
openaiResponse: OpenAI.Chat.ChatCompletion,
): GenerateContentResponse {
const choice = openaiResponse.choices?.[0];
const choice = openaiResponse.choices[0];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] convertOpenAIResponseToGemini() now unconditionally dereferences openaiResponse.choices[0]. The previous code handled an empty/missing first choice and returned an empty Gemini response, so an OpenAI-compatible response with choices: [] will now throw before usage metadata can be converted.

Suggested change
const choice = openaiResponse.choices[0];
const choice = openaiResponse.choices?.[0];
const response = new GenerateContentResponse();
if (!choice) {
response.candidates = [];
response.responseId = openaiResponse.id;
response.createTime = openaiResponse.created
? openaiResponse.created.toString()
: new Date().getTime().toString();
response.modelVersion = this.model;
response.promptFeedback = { safetyRatings: [] };
return response;
}

— gpt-5.5 via Qwen Code /review

return false;
}
collectedGeminiResponses.push(response);
setPendingFinish(response);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] This branch preserves the earlier finish chunk with tool calls, but it also drops response.usageMetadata from the later duplicate finish chunk. That breaks providers that send tool calls on the first finish chunk and token usage on a later finish/usage chunk.

Suggested change
setPendingFinish(response);
if (hasPendingFinish && pendingToolCallCount > 0) {
if (response.usageMetadata) {
lastCollected.usageMetadata = response.usageMetadata;
setPendingFinish(lastCollected);
}
return false;
}

— gpt-5.5 via Qwen Code /review

if (meta.name) {
const id =
meta.id ||
`call_${Date.now()}_${Math.random().toString(36).substring(2, 9)}`;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] addChunk() can remap the provider-supplied index to a different internal parser index, but early emission still reads metadata with the original index. If a provider reuses index 0, this can pair the new arguments with stale metadata, skip the completed call, or emit it again at finish. Please have addChunk() return the resolved internal index or metadata, then use that same key for early emission and duplicate tracking.

— gpt-5.5 via Qwen Code /review

// Skip if already emitted during streaming (prevents duplicates)
if (toolCall.id && this.emittedToolCallIds.has(toolCall.id)) {
continue;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Duplicate suppression only tracks provider IDs, but tool-call IDs are optional in this path. For an id-less call emitted early, no stable emitted key is recorded; when finish_reason arrives, getCompletedToolCalls() can emit the same parser entry again with a different synthetic ID. Track emitted parser/internal indices in addition to provider IDs, or expose a stable parser key for each completed call.

— gpt-5.5 via Qwen Code /review

if (!buffer.trim() || depth > 0 || !meta?.id) {
// Tool call is incomplete if: JSON open, buffer empty, missing ID, or missing name
// Any condition true → index available for reuse
if (!buffer.trim() || depth > 0 || !meta?.id || !meta.name) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Treating a slot with complete JSON and an ID as reusable just because the function name has not arrived yet can corrupt multi-tool streams. If call_1 sends complete arguments before its name, then call_2 starts on the same provider index, findNextAvailableIndex() can return the occupied slot and append/overwrite the first call’s state. Keep such slots occupied for collision allocation; late-name routing should be handled separately.

Suggested change
if (!buffer.trim() || depth > 0 || !meta?.id || !meta.name) {
if (!buffer.trim() || depth > 0 || !meta?.id) {
return this.nextAvailableIndex;
}

— gpt-5.5 via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope/content-generation AI content generation status/need-information More information is needed to resolve this issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants