fix(core): scope StreamingToolCallParser per-stream (#3525)#111
Merged
Conversation
…wenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
WalkthroughThis PR refactors streaming state management in OpenAI content conversion by replacing shared mutable parser state with isolated per-stream contexts, adds filesystem watcher configuration exports to the skill manager, updates telemetry serialization to handle cyclic objects, and introduces integration tests validating concurrent stream isolation. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Pipeline as ContentGenerationPipeline
participant Converter as OpenAIContentConverter
participant Context as StreamContext
participant Parser as StreamingToolCallParser
Client->>Pipeline: executeStream(stream)
activate Pipeline
Pipeline->>Converter: createStreamContext()
activate Converter
Converter->>Context: new ConverterStreamContext()
activate Context
Context->>Parser: new StreamingToolCallParser()
activate Parser
Converter-->>Pipeline: context
deactivate Converter
deactivate Parser
loop For each OpenAI chunk
Client->>Pipeline: async iteration
Pipeline->>Converter: convertOpenAIChunkToGemini(chunk, context)
activate Converter
Converter->>Parser: addChunk(data)
activate Parser
Parser-->>Converter: accumulated state
deactivate Parser
Converter->>Context: processThinkChunk(..., ctx)
Converter-->>Pipeline: GenerateContentResponse
deactivate Converter
end
Pipeline-->>Client: yield response
deactivate Pipeline
deactivate Context
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backport of upstream QwenLM#3525 — fixes a critical concurrency bug where parallel subagents / ACP concurrent calls corrupted each other's tool-call buffers, surfacing as
NO_RESPONSE_TEXT.Extended upstream's design to also scope our fork's
<think>-tag parser state per-stream, since that had the same shared-state problem.What changed
ConverterStreamContextinterface (toolCallParser + thinkBuffer + inThinkTag)createStreamContext()factory replacesresetStreamingToolCalls()convertOpenAIChunkToGemini(chunk, ctx)andprocessThinkChunk(chunk, ctx)thread context throughContentGenerationPipelinecreates one context per stream entryprotoInternalrecovery-note logic preserved on the new shapeDeferred
Upstream's QwenLM#3550 (full stateless converter refactor) is not included — it's a hygiene refactor without a functional bug; QwenLM#3525 alone fixes the concurrency race. Worth doing later if we want to land QwenLM#3550 cleanly.
Test plan
openaiContentGenerator/(incl. newcreateStreamContextsuite)pipeline.concurrent.test.ts(from upstream 38edd9d) — two real concurrent streams, neither corrupts the other🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Improvements