fix(core): scope StreamingToolCallParser per stream, not per Converter (#3516)#3525
Conversation
Issue #3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR #3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs #3516
There was a problem hiding this comment.
Pull request overview
This PR fixes intermittent streaming failures in packages/core by scoping StreamingToolCallParser state to a single OpenAI streaming response (per stream) rather than sharing it across the OpenAIContentConverter instance, preventing cross-stream tool-call buffer corruption during concurrent subagent/parallel agent execution.
Changes:
- Introduce
ConverterStreamContext+createStreamContext()to allocate per-stream tool-call parsing state. - Update the streaming pipeline to create and reuse a single stream context for all chunks in a stream.
- Replace/reset-related tests with new coverage proving interleaved concurrent streams demux correctly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/core/src/core/openaiContentGenerator/pipeline.ts | Creates a per-stream context and passes it through chunk conversion; removes shared-parser resets. |
| packages/core/src/core/openaiContentGenerator/pipeline.test.ts | Updates pipeline mocks/assertions to expect per-stream context creation instead of reset calls. |
| packages/core/src/core/openaiContentGenerator/converter.ts | Removes converter-level parser singleton; adds createStreamContext() and threads context through convertOpenAIChunkToGemini. |
| packages/core/src/core/openaiContentGenerator/converter.test.ts | Reworks tests around createStreamContext() and adds regression coverage for interleaved concurrent streams (#3516). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs #3516
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Correctly diagnoses and fixes the architectural cause of #3516. Scoping StreamingToolCallParser to a per-stream ConverterStreamContext (created once in processStreamWithLogging and threaded through every chunk conversion) eliminates the cross-stream races that dropped or corrupted tool calls under parallel subagents, run_in_background subagents, and ACP concurrent Agent calls. The diff is surgical and complete — no stale references to resetStreamingToolCalls, one call site updated, and the new interleaved-streams regression test exercises the exact failure shape.
A couple of non-blocking follow-up notes:
- The duplicate-finish comment at
pipeline.ts:284-288still says "the streaming tool call parser was already reset after the first finish chunk" — no reset happens anymore; the context was already drained on the first finish. The conclusion is right but the mechanism is stale. - Pre-existing and out of scope:
OpenAIContentConverterstill has shared mutablemodel/modalities/schemaComplianceinstance fields that could momentarily cross-contaminateresponse.modelVersionunder concurrent streams targeting different models. Worth a follow-up if parallel subagents can ever run against different effective models, but much lower impact than the parser bug.
Verdict
APPROVE — the fix targets the real root cause, is complete and minimal, and the regression test directly exercises the failure mode.
End-to-end validationRan Session Before vs afterBoth sessions run
All 5 review agents completed cleanly, plus the two follow-up agents the skill launches in Steps 5 and 6 (verification + reverse audit). Only warnings in the debug log were 2 unrelated rate-limit throttling retries. Together with the unit tests + the pipeline-level integration test ( |
…te races Follow-up to QwenLM#3525. QwenLM#3516 showed that OpenAIContentConverter's long-lived per-pipeline state raced between concurrent streams; QwenLM#3525 scoped the streaming tool-call parser, this removes the remaining shared state. - OpenAIContentConverter is now a module of stand-alone functions; the exported symbol is a namespace object preserved for call-site compatibility. - New RequestContext (in types.ts, alongside PipelineConfig and ErrorHandler) carries model, modalities, startTime, and an optional per-stream toolCallParser. The pipeline builds one per request and threads it through every conversion call. - errorHandler drops duration/isStreaming; duration is recomputed from startTime at error time and troubleshooting text is uniform. - convertOpenAIChunkToGemini now throws if toolCallParser is missing so future misuse surfaces loudly instead of silently constructing a one-shot parser per chunk.
QwenLM#3516) (QwenLM#3525) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#3516) (#3525) * fix(core): scope StreamingToolCallParser per stream, not per Converter Issue #3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR #3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs #3516 * docs(core): clarify GC wording in per-stream context comment (copilot review) * test(core): add pipeline-level integration test for concurrent streams Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs #3516
…3550) * refactor(core): make OpenAI converter stateless to prevent shared-state races Follow-up to #3525. #3516 showed that OpenAIContentConverter's long-lived per-pipeline state raced between concurrent streams; #3525 scoped the streaming tool-call parser, this removes the remaining shared state. - OpenAIContentConverter is now a module of stand-alone functions; the exported symbol is a namespace object preserved for call-site compatibility. - New RequestContext (in types.ts, alongside PipelineConfig and ErrorHandler) carries model, modalities, startTime, and an optional per-stream toolCallParser. The pipeline builds one per request and threads it through every conversion call. - errorHandler drops duration/isStreaming; duration is recomputed from startTime at error time and troubleshooting text is uniform. - convertOpenAIChunkToGemini now throws if toolCallParser is missing so future misuse surfaces loudly instead of silently constructing a one-shot parser per chunk. * test(core): align timeout expectations
* fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…enLM#3590, QwenLM#3505, QwenLM#3467) (#113) * fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com>
…wenLM#3574) (#115) * fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): run Agent tool calls concurrently + graceful degrade (QwenLM#3463) Backport of upstream QwenLM#3463. When the model returns multiple Agent tool calls in a single turn, ACP Session was executing them sequentially in a for-loop, multiplying latency by sub-agent count. - Add private runToolCalls() helper that mirrors coreToolScheduler's partition logic: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); other tools form sequential batches. - Replace 2 for-loops in Session.ts with runToolCalls() calls. - Switch the AgentTool eventEmitter guard from key-presence check to truthy check (commit 651979c) — the key-presence check passed for { eventEmitter: undefined } and crashed inside SubAgentTracker.setup. Note: upstream replaced 3 for-loops; our fork only had 2 in those code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): support SSE and HTTP MCP servers in ACP mode In ACP mode, the Mcp server list sent by the IDE client can include SSE (type: "sse") and HTTP (type: "http") transports, but the previous implementation only handled stdio servers via toStdioServer(). Non-stdio servers were silently skipped (continue), so any SSE/HTTP-configured MCP server would never be registered. Changes: - Add toSseServer() helper: detects type=="sse" servers and maps them to MCPServerConfig(url=..., headers=...) - Add toHttpServer() helper: detects type=="http" servers and maps them to MCPServerConfig(httpUrl=..., headers=...) - Refactor newSessionConfig() loop to handle all three transport types - Declare mcpCapabilities: { sse: true, http: true } in agentCapabilities so IDE clients know this agent supports these transports without needing a transparent proxy - Export the three helper functions for unit testing Tests: - Unit tests for toStdioServer / toSseServer / toHttpServer helpers (type discrimination, mutual exclusion) - Integration-style tests for QwenAgent.initialize() mcpCapabilities - Integration-style tests for newSession() with SSE/HTTP MCP servers, verifying MCPServerConfig is constructed with the correct arguments (url vs httpUrl, headers passthrough, empty-headers → undefined) Fixes QwenLM#3472 --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com> Co-authored-by: LaZzyMan <zeusdream7@gmail.com>
* fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): run Agent tool calls concurrently + graceful degrade (QwenLM#3463) Backport of upstream QwenLM#3463. When the model returns multiple Agent tool calls in a single turn, ACP Session was executing them sequentially in a for-loop, multiplying latency by sub-agent count. - Add private runToolCalls() helper that mirrors coreToolScheduler's partition logic: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); other tools form sequential batches. - Replace 2 for-loops in Session.ts with runToolCalls() calls. - Switch the AgentTool eventEmitter guard from key-presence check to truthy check (commit 651979c) — the key-presence check passed for { eventEmitter: undefined } and crashed inside SubAgentTracker.setup. Note: upstream replaced 3 for-loops; our fork only had 2 in those code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): support SSE and HTTP MCP servers in ACP mode In ACP mode, the Mcp server list sent by the IDE client can include SSE (type: "sse") and HTTP (type: "http") transports, but the previous implementation only handled stdio servers via toStdioServer(). Non-stdio servers were silently skipped (continue), so any SSE/HTTP-configured MCP server would never be registered. Changes: - Add toSseServer() helper: detects type=="sse" servers and maps them to MCPServerConfig(url=..., headers=...) - Add toHttpServer() helper: detects type=="http" servers and maps them to MCPServerConfig(httpUrl=..., headers=...) - Refactor newSessionConfig() loop to handle all three transport types - Declare mcpCapabilities: { sse: true, http: true } in agentCapabilities so IDE clients know this agent supports these transports without needing a transparent proxy - Export the three helper functions for unit testing Tests: - Unit tests for toStdioServer / toSseServer / toHttpServer helpers (type discrimination, mutual exclusion) - Integration-style tests for QwenAgent.initialize() mcpCapabilities - Integration-style tests for newSession() with SSE/HTTP MCP servers, verifying MCPServerConfig is constructed with the correct arguments (url vs httpUrl, headers passthrough, empty-headers → undefined) Fixes QwenLM#3472 * fix(openai): when samplingParams is set, pass it through verbatim Previously pipeline.ts always hardcoded max_tokens as the output-token parameter name on the OpenAI-compatible path, falling back from samplingParams.max_tokens to request.config.maxOutputTokens to provider defaults. This broke GPT-5 / o-series on OpenAI and Azure OpenAI, which require max_completion_tokens and reject max_tokens with a 400 error. Fix: when the user provides samplingParams explicitly, treat it as the complete source of truth for the wire shape and pass its keys through verbatim. No client-injected defaults, no request fallbacks, no hardcoded parameter names. The user describes what the provider wants; the client trusts them. When samplingParams is absent, the historical default behavior (request fallback through temperature/top_p/.../max_tokens plus provider defaults) is preserved unchanged — existing users see no difference. Concretely, users can now set any of: samplingParams: { max_tokens: 4096 } # GPT-4 / Qwen / DeepSeek samplingParams: { max_completion_tokens: 4096 } # GPT-5 / o-series samplingParams: { reasoning_effort: 'medium' } # future knobs without waiting for a qwen-code release that adds model-specific branches. Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com> --------- Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com> Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com> Co-authored-by: LaZzyMan <zeusdream7@gmail.com> Co-authored-by: Gordon Lam (SH) <yeelam@microsoft.com>
QwenLM#3516) (QwenLM#3525) * fix(core): scope StreamingToolCallParser per stream, not per Converter Issue QwenLM#3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR QwenLM#3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs QwenLM#3516 * docs(core): clarify GC wording in per-stream context comment (copilot review) * test(core): add pipeline-level integration test for concurrent streams Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs QwenLM#3516
…3525) (QwenLM#3550) * refactor(core): make OpenAI converter stateless to prevent shared-state races Follow-up to QwenLM#3525. QwenLM#3516 showed that OpenAIContentConverter's long-lived per-pipeline state raced between concurrent streams; QwenLM#3525 scoped the streaming tool-call parser, this removes the remaining shared state. - OpenAIContentConverter is now a module of stand-alone functions; the exported symbol is a namespace object preserved for call-site compatibility. - New RequestContext (in types.ts, alongside PipelineConfig and ErrorHandler) carries model, modalities, startTime, and an optional per-stream toolCallParser. The pipeline builds one per request and threads it through every conversion call. - errorHandler drops duration/isStreaming; duration is recomputed from startTime at error time and troubleshooting text is uniform. - convertOpenAIChunkToGemini now throws if toolCallParser is missing so future misuse surfaces loudly instead of silently constructing a one-shot parser per chunk. * test(core): align timeout expectations
Summary
Fixes the real root cause behind the
Model stream ended with empty response textsubagent failures reported in #3516:StreamingToolCallParseris currently a per-Convertersingleton, and concurrent streams (subagents, fork children, background agents, ACP parallel Agent calls after #3463) race on its state, corrupting each other's tool-call buffers.A previous attempt at a narrower fix (#3521, now closed) only addressed same-stream id collisions. Reviewer feedback and independent analysis converged on the same conclusion: the problem is architectural.
Root cause
Every caller of
Config.getContentGenerator()— foreground turns, fork subagents,run_in_background: truesubagents, ACP concurrent Agent calls (#3463) — hits the same parser instance. When two streams run concurrently:processStreamWithLogging, callsresetStreamingToolCalls(), buffers{"file_path":"/aforcall_Aatindex=0.resetStreamingToolCalls()— wipes A's partial buffer.{"file_path":"/binto the now-emptyindex=0bucket.index=0) routes to bucket 0, which now belongs to B.getCompletedToolCalls()returns corrupt JSON like{"file_path":"/a{"file_path":"/b/x.ts"}that jsonrepair cannot salvage. All affected tool calls are dropped.geminiChat.processStreamResponsethrowsInvalidStreamError('NO_RESPONSE_TEXT')— the user-visible symptom in subagent fails with "Model stream ended with empty response text" when model legitimately ends turn with no content #3516.Real-world trigger:
coreToolSchedulerserializes foregroundagent()calls (Kind.Other, not CONCURRENCY_SAFE), so the core path rarely hits this. ACP'sSession.runToolCalls(after #3463) fans subagents out viaPromise.alland does trigger it, as does anyrun_in_background: truesubagent while the parent turn is still streaming.Debug-log evidence from a
/reviewsession that spawned 4 concurrentgeneral-purposesubagents: 29[JSON_PARSE] Failed to parse JSON even with jsonrepairentries, 24NO_RESPONSE_TEXTretries, 4 subagent failures — all clustered in a 3-minute window where the 4 streams' tool-call chunks were interleaving through the shared parser.Fix
Move parser state from Converter instance field into per-stream local state.
converter.tsConverterStreamContextinterface wrapping the parser (keeps room for future per-stream state without signature churn).createStreamContext()factory on the Converter returning a fresh context each call.convertOpenAIChunkToGemini(chunk)→convertOpenAIChunkToGemini(chunk, ctx). All internal parser calls (addChunk,getCompletedToolCalls,hasIncompleteToolCalls) route throughctx.toolCallParser.streamingToolCallParserinstance field.resetStreamingToolCalls()— stream-local contexts are discarded when the generator unwinds, no explicit reset needed.pipeline.tsprocessStreamWithLoggingcreates one context at stream entry:const streamCtx = this.converter.createStreamContext();streamCtxto everyconvertOpenAIChunkToGeminicall.resetStreamingToolCalls()call sites (entry and error path).Why not the narrower fix (PR #3521)?
PR #3521 only allocated a fresh bucket when two distinct ids collided within a single stream. That leaves cross-stream sharing fully broken and only helps if a single provider reuses
indexacross parallel calls within one response. Empirical evidence (QWEN_CAPTURE_TOOL_CHUNKS=1on a live qwen3.6-plus DashScope stream) showed distinct indices per call within a single stream, so same-stream collisions were not the driver of real-world failures. The driver was always cross-stream sharing.Tests
resetStreamingToolCallssuite with acreateStreamContextsuite asserting distinct contexts are independent and writes to one never leak.demuxes interleaved chunks from two concurrent streams correctly (#3516)that drives two interleaved streams through the same Converter via different contexts and asserts both function calls close cleanly with correct arguments and ids. This is the smallest reproducer of the exact bug shape.createStreamContext().pipeline.test.tsmocks updated:resetStreamingToolCalls: vi.fn()→createStreamContext: vi.fn(() => ({ toolCallParser: new StreamingToolCallParser() })). Existing assertions about reset-count converted tocreateStreamContextexpectations.packages/coretest suite: 841 passed.Scope & risk
Config/ContentGenerator/OpenAIContentGeneratormoves. The signature change is only onOpenAIContentConverter.convertOpenAIChunkToGemini, which is called from exactly one place (pipeline.ts).Related closed issues likely resolved by this
enable_thinking: true; reporter confirmed qwen3-coder-plus worked but qwen3-max failedUsers on those threads may want to re-verify once this lands.
Other open issues with symptoms that may share this root cause
Not auto-closing these — the symptoms overlap but the causal chain is more speculative. Worth a second look once this lands; reporters may want to re-verify:
Closes #3516
Supersedes #3521