fix(core): never merge distinct parallel tool-call ids into one buffer#3521
fix(core): never merge distinct parallel tool-call ids into one buffer#3521zhangxy-zju wants to merge 1 commit into
Conversation
StreamingToolCallParser reused an incomplete buffer whenever a new,
distinct tool-call id arrived at an already-occupied `index`. Provider
quirks (DashScope/Qwen emits parallel tool calls all reporting
`index: 0` with different ids) triggered that path on every call,
interleaving arguments from separate calls into a single accumulator
— producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...`
that `jsonrepair` cannot fix. All affected tool calls get dropped,
and the subagent surfaces as `InvalidStreamError: NO_RESPONSE_TEXT`
(issue #3516 symptom; real root cause was here).
Fix: when a distinct id arrives at a populated index, unconditionally
allocate a fresh bucket via a new `allocateFreshIndex()` helper —
regardless of whether the existing call is complete or still
streaming. The previous `findNextAvailableIndex` helper treated
"incomplete" buckets as "available" and cannot be reused for this
purpose without reintroducing the merge.
A pre-existing test (`should reuse incomplete index when available`)
codified the buggy behavior as intended; it is flipped to assert the
corrected semantics. Two regression tests reproduce the 2-way and
3-way interleaved parallel tool call scenarios captured in real
debug logs.
Refs #3516
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
|
Closing this PR — the fix is necessary but insufficient. After deeper investigation, the real root cause is architectural and sits one layer above what this PR touches. A follow-up PR will replace it. What the reviewer flagged (thanks!)The request-changes feedback on continuation chunks without But the deeper issue
Code pointers:
When two streams run concurrently (two subagents, two parallel chat turns, two fork subagents, etc.) they race on the same parser: This is independent of whether any single stream is OpenAI-spec-compliant (distinct index per call). I empirically verified with The debug log that motivated #3516 revisited
The problem will get strictly worse once #3463 lands ACP Agent concurrency, because every parallel subagent execution now produces the exact shape above. Follow-up PRMoves the parser from Converter instance state into per-stream local state created inside // converter.ts — drop instance parser, add a factory + per-stream API
createStreamContext(): StreamContext {
return { toolCallParser: new StreamingToolCallParser() };
}
convertOpenAIChunkToGemini(chunk, ctx: StreamContext) { ... }
// pipeline.ts — per-stream context
private async *processStreamWithLogging(stream, context, request) {
const streamCtx = this.converter.createStreamContext();
for await (const chunk of stream) {
const response = this.converter.convertOpenAIChunkToGemini(chunk, streamCtx);
...
}
}Side effects:
Will link the new PR here once it's up. |
|
Superseded by upcoming PR — see previous comment for rationale. |
Summary
StreamingToolCallParsermerged chunks from distinct tool calls into a single buffer when a new tool-call id arrived at an already-occupiedindexwhose existing call was still streaming. Provider quirks — notably DashScope/Qwen parallel tool calls all reportingindex: 0with different ids — triggered that path on every parallel call, producing corrupt interleaved JSON like{"file_path": "/A{"file_path": "/B...thatjsonrepaircannot fix. All affected tool calls are then dropped entirely, and the subagent surfaces asInvalidStreamError: NO_RESPONSE_TEXTat the chat-stream boundary.This is the real root cause behind many of the "Failed to run subagent: Model stream ended with empty response text" reports tracked in #3516. On my local debug logs (23 sessions over several weeks), 100% of
NO_RESPONSE_TEXToccurrences were preceded by[JSON_PARSE] Failed to parse JSON even with jsonrepairstack traces pointing atStreamingToolCallParser.getCompletedToolCalls— zero were genuine "model had nothing to say" cases.Fix
When a distinct id arrives at an index that's already owned by a different id, always allocate a fresh bucket — whether the existing call is complete or still streaming. Previously the parser only relocated when the existing buffer successfully JSON-parsed; an incomplete buffer fell through to "reuse this index", concatenating the new call's arguments into the old one's accumulator.
Core change in
addChunk:A new
allocateFreshIndex()helper returns an index with no existing state at all. The previousfindNextAvailableIndexhelper treats "incomplete" buckets as "available" (designed for continuation-chunk routing) and cannot be reused here without reintroducing the merge.Tests
should reuse incomplete index when available→should never merge a new tool call id into an incomplete bucket. The original test explicitly codified the buggy behavior as intended (expected chunks from a new id to complete the previous call's JSON); the flipped test asserts two distinct buckets.index: 0with distinct ids and interleaved chunks end up in separate buckets and both close cleanly.converter.test.ts(67 tests) andpipeline.test.ts(29 tests) pass unchanged.Why not just tolerate
NO_RESPONSE_TEXTas legitimate?My earlier draft (
fix/accept-empty-end-turn-responses, not pushed) removed theNO_RESPONSE_TEXTthrow and accepted finish_reason + empty content as a valid no-op turn — matching Claude Code'sisResultSuccessfulcarve-out. That approach masks this parser bug: the subagent would silently "succeed" while its tool calls were being dropped, giving no signal to debug. The right fix is at the parser layer.NO_RESPONSE_TEXTremains valuable as a monitoring signal for genuine empty-response cases (if any); after this fix, the 23 sessions in my logs would have surfaced zero of them.Refs #3516
Related issues (likely same root cause)
The following closed issues report the same
Model stream ended with empty response text/ silently-dropped-tool-call symptoms under providers / models that match the parallel-tool-call-with-shared-index=0 pattern this PR addresses:enable_thinking: true; reporter hs-ye confirms qwen3-coder-plus works but qwen3-max consistently failsUsers on those threads may want to re-verify once this lands.