fix(cli): run ACP Agent tool calls concurrently (#2516)#3463
Conversation
When the model returns multiple Agent tool calls in a single turn, the ACP Session previously executed them sequentially in a plain for-loop, multiplying latency by the number of sub-agents spawned. Mirror the partition logic in coreToolScheduler.partitionToolCalls: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); any other tool forms its own sequential batch so the model's implicit ordering is preserved. Response-part ordering still matches the original functionCalls order. Add a focused test that uses controllable deferred executes to prove both Agent calls start before either resolves, and that the fed-back functionResponse ordering is stable regardless of resolution order.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Two issues raised by the /review bot: 1. The raw Promise.all fan-out bypassed the bounded-concurrency guard that coreToolScheduler applies via QWEN_CODE_MAX_TOOL_CONCURRENCY. Replaced with an inline runBounded helper that mirrors core's runConcurrently (Promise.race on a bounded executing set, default cap 10), keeping in-order result collection. 2. The concurrency test used a 10-iteration microtask yield loop before asserting both execute() spies had been invoked. That's fragile — runTool's pre-execute path (build → getDefaultPermission → evaluatePermissionRules → permission branch → PreToolUseHook) has more await boundaries than 10 ticks guarantees, and the CI run reported call-a still at 0 invocations at the assertion point. Reworked the test to wait on an explicit `called` deferred that resolves *inside* the execute() mock body. Under sequential behaviour only one `called` would ever fire → `Promise.all([called-a, called-b])` deadlocks → vitest's per-test timeout surfaces the regression. Under the fix both fire before either result resolves.
|
Thanks for the review. Pushed [Critical] test timing — the 10-tick microtask loop wasn't enough to cover the [Suggestion] bounded concurrency — replaced Happy to factor |
…itter The concurrency test for #2516 timed out on CI with "Test timed out in 5000ms" after the `await Promise.all([called-a, called-b])` rewrite in the previous review-fix commit. The 5000ms wait was the symptom; the root cause is that neither `execute()` was ever being called. runTool's AgentTool branch was guarded with `'eventEmitter' in invocation`, which is a *key-presence* check. The test mock provides `{ eventEmitter: undefined, ... }` — the key exists (value undefined), the branch is entered, and `SubAgentTracker.setup` immediately throws inside `eventEmitter.on(...)`. The try/catch in runTool swallows the throw and returns an error response, so `invocation.execute()` never runs, `called[id].resolve()` never fires, and the test deadlocks. The earlier review commit (4519c5f) interpreted the CI symptom as "10 microtask yields aren't enough" and rewrote the assertion around a deferred `Promise.all`. But the old test's `toHaveBeenCalledTimes(1)` failure with 0 invocations was already the same bug — execute was never called. The new formulation just converted the visible failure from an assertion mismatch into a timeout. Switch the guard to a truthy check against `invocation.eventEmitter`. Semantics for real AgentTool are unchanged — `agent.ts:392` declares `readonly eventEmitter: AgentEventEmitter = new AgentEventEmitter()`, so production always enters the branch. The only new behavior is that incomplete invocations (or test mocks) skip SubAgentTracker setup cleanly instead of crashing. `subAgentCleanupFunctions` stays `[]`, so the cleanup forEach at the success/error paths is a no-op.
End-to-end validation resultsPosting scripted + integration test results for the record. 1. Basic concurrencySpawned two Explore subagents in a single model turn via a direct ACP client (JSON-RPC over stdin/stdout to Prompt: Observed timing (script records Total wall-clock = max(40.8, 3.3) = 40.8s, not the pre-fix 44.1s sum. 2. Concurrency cap boundarySet
3. Non-agent tool regressionRan 3 integration cases in both Follow-up commit on this branchAfter the review commit, CI still failed with the concurrency test timing out at 5000ms. Root cause was not the microtask-yield count but a test-mock crash inside the AgentTool branch:
Switched the guard to a truthy check on Full Test matrix green on all 9 OS × Node combos after this change. |
yiliang114
left a comment
There was a problem hiding this comment.
Thanks for iterating on this. Latest head looks good to me.
The ACP path now mirrors the bounded-concurrency behavior from core, the regression test is exercising the real async boundary instead of relying on timing assumptions, and the eventEmitter guard fix makes the test path line up with production behavior more cleanly. No additional blockers from my side.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review
合并前建议:补一批 partition 覆盖测试跑了一轮阳性控制,想把发现留在这里,给合并一个更硬的信心基线。 当前 PR 的并发测试钉死了"两个连续 Agent 在同一批里并行 + 响应按输入顺序"这一个 case,但 本地补了 6 个测试,用 deferred promise 钉住每一次 阳性控制(把
|
| 测试 | main | PR | 作用 |
|---|---|---|---|
| 原 PR 测试:2 Agent 并发 | ❌ timeout | ✅ | 并发主路径(已在 PR 里) |
[Agent, Read, Agent] 3 单 Agent 批 |
✅ | ✅ | 回归守卫:PR 不能把孤立 Agent 误合进并发批 |
| 连续 4 Agent 合并成一批 | ❌ timeout | ✅ | 合并逻辑 |
QWEN_CODE_MAX_TOOL_CONCURRENCY=2 + 4 Agent |
❌ timeout | ✅ | 限流开关非 1 值的实际效果 |
[Agent, Agent, Read, Agent, Agent] 切成 (par, seq, par) |
❌ timeout | ✅ | 混合 partition(之前 0 覆盖) |
| 连续 3 Read 保持顺序 | ✅ | ✅ | 回归守卫:PR 不能意外把非 Agent 并行化 |
并发批里 Agent throw 时的错误隔离 |
❌ timeout | ✅ | 批内一失败不连累兄弟,走 runTool 的 try/catch 转 errorResponse |
5 个判别式(main 失败 & PR 通过)+ 2 个回归守卫(两边都过,用于未来防退化)。
PR 分支上跑这批测试:7/7 绿,132ms。原 26 个 Session 测试无影响,全量 src/acp-integration/ 189 测试都过。Typecheck / lint 干净。
测试 mock 刻意不带 eventEmitter key,这样每个失败都是由"该 PR 的 runToolCalls/runBounded 并发逻辑缺失"造成,而不是被同 PR 里另一处 'eventEmitter' in invocation → 真值判断的修正误伤——否则两个改动会耦合在一起,不好定位。
patch
可以直接 git apply 到这个 PR 分支:
patch(展开)
diff --git a/packages/cli/src/acp-integration/session/Session.test.ts b/packages/cli/src/acp-integration/session/Session.test.ts
index 8e7a5cd6c..e0c6b3a98 100644
--- a/packages/cli/src/acp-integration/session/Session.test.ts
+++ b/packages/cli/src/acp-integration/session/Session.test.ts
@@ -1184,6 +1184,465 @@ describe('Session', () => {
.map((p) => p.functionResponse?.id);
expect(ids).toEqual(['call-a', 'call-b']);
});
+
+ // ------------------------------------------------------------------
+ // Additional coverage added to pin down the behaviour of
+ // `Session.runToolCalls` / `runBounded` across the full partition
+ // matrix (see PR #3463). All tests below use deferred promises so
+ // there are no wall-clock assumptions — each test fails via test
+ // timeout if the runtime serialises a step that should be parallel
+ // (or vice versa).
+ // ------------------------------------------------------------------
+
+ type Deferred<T> = {
+ promise: Promise<T>;
+ resolve: (v: T) => void;
+ reject: (e: Error) => void;
+ settled: boolean;
+ };
+
+ const makeDeferred = <T>(): Deferred<T> => {
+ const d: Deferred<T> = {
+ settled: false,
+ // placeholders; overwritten below
+ resolve: () => {},
+ reject: () => {},
+ promise: undefined as unknown as Promise<T>,
+ };
+ d.promise = new Promise<T>((res, rej) => {
+ d.resolve = (v) => {
+ d.settled = true;
+ res(v);
+ };
+ d.reject = (e) => {
+ d.settled = true;
+ rej(e);
+ };
+ });
+ return d;
+ };
+
+ // Drain pending microtasks so any microtask-scheduled work (e.g. a
+ // would-be-but-shouldn't-happen next `runTool`) has had every chance
+ // to run before we assert it did not.
+ const drainMicrotasks = async (n = 50) => {
+ for (let i = 0; i < n; i++) {
+ await Promise.resolve();
+ }
+ };
+
+ type ToolFactory = {
+ tool: {
+ name: string;
+ kind: core.Kind;
+ build: ReturnType<typeof vi.fn>;
+ };
+ entered: (id: string) => Deferred<void>;
+ result: (id: string) => Deferred<core.ToolResult>;
+ };
+
+ const makeToolFactory = (
+ name: string,
+ kind: core.Kind,
+ options: { throwOnExecute?: Set<string> } = {},
+ ): ToolFactory => {
+ const enteredMap = new Map<string, Deferred<void>>();
+ const resultMap = new Map<string, Deferred<core.ToolResult>>();
+ const throwSet = options.throwOnExecute ?? new Set<string>();
+
+ const getEntered = (id: string) => {
+ let d = enteredMap.get(id);
+ if (!d) {
+ d = makeDeferred<void>();
+ enteredMap.set(id, d);
+ }
+ return d;
+ };
+ const getResult = (id: string) => {
+ let d = resultMap.get(id);
+ if (!d) {
+ d = makeDeferred<core.ToolResult>();
+ resultMap.set(id, d);
+ }
+ return d;
+ };
+
+ return {
+ tool: {
+ name,
+ kind,
+ build: vi
+ .fn()
+ .mockImplementation((args: Record<string, unknown>) => {
+ const id = args['_test_id'] as string;
+ const enteredD = getEntered(id);
+ const resultD = getResult(id);
+ // Deliberately omit the `eventEmitter` key so the test
+ // isolates runToolCalls/runBounded behaviour from the
+ // separate `'eventEmitter' in invocation` guard fix.
+ return {
+ params: args,
+ getDefaultPermission: vi
+ .fn()
+ .mockResolvedValue('allow'),
+ getDescription: vi
+ .fn()
+ .mockReturnValue(`${name}:${id}`),
+ toolLocations: vi.fn().mockReturnValue([]),
+ execute: vi.fn().mockImplementation(async () => {
+ enteredD.resolve();
+ if (throwSet.has(id)) {
+ await resultD.promise; // allow test to sequence
+ throw new Error(`forced-fail:${id}`);
+ }
+ return resultD.promise;
+ }),
+ };
+ }),
+ },
+ entered: getEntered,
+ result: getResult,
+ };
+ };
+
+ const setupStream = (calls: Array<{ id: string; name: string }>) => {
+ const sendMessageStream = vi
+ .fn()
+ .mockResolvedValueOnce(
+ createStreamWithChunks([
+ {
+ type: core.StreamEventType.CHUNK,
+ value: {
+ functionCalls: calls.map((c) => ({
+ id: c.id,
+ name: c.name,
+ args: {
+ _test_id: c.id,
+ ...(c.name === core.ToolNames.AGENT
+ ? { subagent_type: 'explore' }
+ : {}),
+ },
+ })),
+ },
+ },
+ ]),
+ )
+ .mockResolvedValueOnce(createEmptyStream());
+ mockChat.sendMessageStream = sendMessageStream;
+ return sendMessageStream;
+ };
+
+ const configureRegistry = (
+ agentFactory: ToolFactory,
+ extra: Record<string, ToolFactory> = {},
+ ) => {
+ mockToolRegistry.getTool.mockImplementation((name: string) => {
+ if (name === core.ToolNames.AGENT) return agentFactory.tool;
+ return extra[name]?.tool;
+ });
+ mockConfig.getApprovalMode = vi
+ .fn()
+ .mockReturnValue(ApprovalMode.DEFAULT);
+ mockConfig.getPermissionManager = vi.fn().mockReturnValue(null);
+ };
+
+ const followUpIds = (sendMessageStream: ReturnType<typeof vi.fn>) => {
+ const followUp = sendMessageStream.mock.calls[1][1] as {
+ message: Array<{ functionResponse?: { id?: string } }>;
+ };
+ return followUp.message
+ .filter((p) => p.functionResponse)
+ .map((p) => p.functionResponse?.id);
+ };
+
+ const followUpResponses = (
+ sendMessageStream: ReturnType<typeof vi.fn>,
+ ) => {
+ const followUp = sendMessageStream.mock.calls[1][1] as {
+ message: Array<{
+ functionResponse?: {
+ id?: string;
+ response?: { error?: string; output?: unknown };
+ };
+ }>;
+ };
+ return followUp.message
+ .filter((p) => p.functionResponse)
+ .map((p) => p.functionResponse!);
+ };
+
+ it('partitions [Agent, Read, Agent] into 3 sequential batches', async () => {
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ );
+ const readF = makeToolFactory('read_file', core.Kind.Read);
+ configureRegistry(agentF, { read_file: readF });
+
+ const sendMessageStream = setupStream([
+ { id: 'a', name: core.ToolNames.AGENT },
+ { id: 'r', name: 'read_file' },
+ { id: 'b', name: core.ToolNames.AGENT },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'mixed' }],
+ });
+
+ // 1) Only Agent-A may run first — its own batch of size 1.
+ await agentF.entered('a').promise;
+ await drainMicrotasks();
+ expect(readF.entered('r').settled).toBe(false);
+ expect(agentF.entered('b').settled).toBe(false);
+
+ // 2) Resolve Agent-A → Read starts. Agent-B must still not start.
+ agentF.result('a').resolve({ llmContent: 'A', returnDisplay: 'A' });
+ await readF.entered('r').promise;
+ await drainMicrotasks();
+ expect(agentF.entered('b').settled).toBe(false);
+
+ // 3) Resolve Read → Agent-B starts.
+ readF.result('r').resolve({ llmContent: 'R', returnDisplay: 'R' });
+ await agentF.entered('b').promise;
+
+ agentF.result('b').resolve({ llmContent: 'B', returnDisplay: 'B' });
+ await promptPromise;
+
+ expect(followUpIds(sendMessageStream)).toEqual(['a', 'r', 'b']);
+ });
+
+ it('coalesces a run of 4 Agent calls into a single parallel batch', async () => {
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ );
+ configureRegistry(agentF);
+
+ const sendMessageStream = setupStream([
+ { id: 'a', name: core.ToolNames.AGENT },
+ { id: 'b', name: core.ToolNames.AGENT },
+ { id: 'c', name: core.ToolNames.AGENT },
+ { id: 'd', name: core.ToolNames.AGENT },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'four agents' }],
+ });
+
+ // All 4 execute() bodies must be entered before any result is
+ // resolved. If the runtime serialises any of them this deadlocks.
+ await Promise.all(
+ ['a', 'b', 'c', 'd'].map((id) => agentF.entered(id).promise),
+ );
+
+ // Resolve in reverse order to also check result-order stability.
+ agentF.result('d').resolve({ llmContent: 'D', returnDisplay: 'D' });
+ agentF.result('c').resolve({ llmContent: 'C', returnDisplay: 'C' });
+ agentF.result('b').resolve({ llmContent: 'B', returnDisplay: 'B' });
+ agentF.result('a').resolve({ llmContent: 'A', returnDisplay: 'A' });
+ await promptPromise;
+
+ expect(followUpIds(sendMessageStream)).toEqual(['a', 'b', 'c', 'd']);
+ });
+
+ it('honours QWEN_CODE_MAX_TOOL_CONCURRENCY=2 with 4 Agent calls', async () => {
+ const originalEnv = process.env['QWEN_CODE_MAX_TOOL_CONCURRENCY'];
+ process.env['QWEN_CODE_MAX_TOOL_CONCURRENCY'] = '2';
+ try {
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ );
+ configureRegistry(agentF);
+
+ const sendMessageStream = setupStream([
+ { id: 'a', name: core.ToolNames.AGENT },
+ { id: 'b', name: core.ToolNames.AGENT },
+ { id: 'c', name: core.ToolNames.AGENT },
+ { id: 'd', name: core.ToolNames.AGENT },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'four agents capped' }],
+ });
+
+ // Only the first 2 should enter.
+ await agentF.entered('a').promise;
+ await agentF.entered('b').promise;
+ await drainMicrotasks();
+ expect(agentF.entered('c').settled).toBe(false);
+ expect(agentF.entered('d').settled).toBe(false);
+
+ // Resolve A → C should now be admitted; D must still wait.
+ agentF.result('a').resolve({ llmContent: 'A', returnDisplay: 'A' });
+ await agentF.entered('c').promise;
+ await drainMicrotasks();
+ expect(agentF.entered('d').settled).toBe(false);
+
+ // Resolve B → D admitted.
+ agentF.result('b').resolve({ llmContent: 'B', returnDisplay: 'B' });
+ await agentF.entered('d').promise;
+
+ agentF.result('c').resolve({ llmContent: 'C', returnDisplay: 'C' });
+ agentF.result('d').resolve({ llmContent: 'D', returnDisplay: 'D' });
+ await promptPromise;
+
+ expect(followUpIds(sendMessageStream)).toEqual([
+ 'a',
+ 'b',
+ 'c',
+ 'd',
+ ]);
+ } finally {
+ if (originalEnv === undefined) {
+ delete process.env['QWEN_CODE_MAX_TOOL_CONCURRENCY'];
+ } else {
+ process.env['QWEN_CODE_MAX_TOOL_CONCURRENCY'] = originalEnv;
+ }
+ }
+ });
+
+ it('partitions [Agent, Agent, Read, Agent, Agent] into 3 batches (par, seq, par)', async () => {
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ );
+ const readF = makeToolFactory('read_file', core.Kind.Read);
+ configureRegistry(agentF, { read_file: readF });
+
+ const sendMessageStream = setupStream([
+ { id: 'a1', name: core.ToolNames.AGENT },
+ { id: 'a2', name: core.ToolNames.AGENT },
+ { id: 'r', name: 'read_file' },
+ { id: 'b1', name: core.ToolNames.AGENT },
+ { id: 'b2', name: core.ToolNames.AGENT },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'mixed batches' }],
+ });
+
+ // Batch 1: a1, a2 enter in parallel, before r / b1 / b2.
+ await Promise.all([
+ agentF.entered('a1').promise,
+ agentF.entered('a2').promise,
+ ]);
+ await drainMicrotasks();
+ expect(readF.entered('r').settled).toBe(false);
+ expect(agentF.entered('b1').settled).toBe(false);
+ expect(agentF.entered('b2').settled).toBe(false);
+
+ // Resolve a1+a2 (order irrelevant) → r enters alone.
+ agentF.result('a1').resolve({ llmContent: 'A1', returnDisplay: 'A1' });
+ agentF.result('a2').resolve({ llmContent: 'A2', returnDisplay: 'A2' });
+ await readF.entered('r').promise;
+ await drainMicrotasks();
+ expect(agentF.entered('b1').settled).toBe(false);
+ expect(agentF.entered('b2').settled).toBe(false);
+
+ // Resolve r → batch 3 fans out in parallel.
+ readF.result('r').resolve({ llmContent: 'R', returnDisplay: 'R' });
+ await Promise.all([
+ agentF.entered('b1').promise,
+ agentF.entered('b2').promise,
+ ]);
+
+ agentF.result('b1').resolve({ llmContent: 'B1', returnDisplay: 'B1' });
+ agentF.result('b2').resolve({ llmContent: 'B2', returnDisplay: 'B2' });
+ await promptPromise;
+
+ expect(followUpIds(sendMessageStream)).toEqual([
+ 'a1',
+ 'a2',
+ 'r',
+ 'b1',
+ 'b2',
+ ]);
+ });
+
+ it('keeps consecutive non-Agent tools strictly sequential', async () => {
+ // [Read, Read, Read] must NOT be parallelised — PR scope is
+ // Agent-only. Each Read only enters after the previous resolves.
+ const readF = makeToolFactory('read_file', core.Kind.Read);
+ // An agent factory is still registered as a dummy so configureRegistry
+ // is consistent; no Agent calls are issued here.
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ );
+ configureRegistry(agentF, { read_file: readF });
+
+ const sendMessageStream = setupStream([
+ { id: 'r1', name: 'read_file' },
+ { id: 'r2', name: 'read_file' },
+ { id: 'r3', name: 'read_file' },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'three reads' }],
+ });
+
+ await readF.entered('r1').promise;
+ await drainMicrotasks();
+ expect(readF.entered('r2').settled).toBe(false);
+ expect(readF.entered('r3').settled).toBe(false);
+
+ readF.result('r1').resolve({ llmContent: 'R1', returnDisplay: 'R1' });
+ await readF.entered('r2').promise;
+ await drainMicrotasks();
+ expect(readF.entered('r3').settled).toBe(false);
+
+ readF.result('r2').resolve({ llmContent: 'R2', returnDisplay: 'R2' });
+ await readF.entered('r3').promise;
+ readF.result('r3').resolve({ llmContent: 'R3', returnDisplay: 'R3' });
+ await promptPromise;
+
+ expect(followUpIds(sendMessageStream)).toEqual(['r1', 'r2', 'r3']);
+ });
+
+ it('isolates failures in a parallel Agent batch', async () => {
+ // Agent-a is configured to throw; Agent-b succeeds. The parallel
+ // batch must finish with both reported as functionResponses in
+ // input order, the failing one carrying an `error` field — no
+ // uncaught rejection, no missing sibling response.
+ const agentF = makeToolFactory(
+ core.ToolNames.AGENT,
+ core.Kind.Think,
+ { throwOnExecute: new Set(['a']) },
+ );
+ configureRegistry(agentF);
+
+ const sendMessageStream = setupStream([
+ { id: 'a', name: core.ToolNames.AGENT },
+ { id: 'b', name: core.ToolNames.AGENT },
+ ]);
+
+ const promptPromise = session.prompt({
+ sessionId: 'test-session-id',
+ prompt: [{ type: 'text', text: 'one-fails' }],
+ });
+
+ await Promise.all([
+ agentF.entered('a').promise,
+ agentF.entered('b').promise,
+ ]);
+
+ // Resolve-to-throw for a (its execute() does `await result; throw`)
+ // and resolve-ok for b.
+ agentF.result('a').resolve({ llmContent: '', returnDisplay: '' });
+ agentF.result('b').resolve({ llmContent: 'B', returnDisplay: 'B' });
+ await promptPromise;
+
+ const responses = followUpResponses(sendMessageStream);
+ expect(responses.map((r) => r.id)).toEqual(['a', 'b']);
+ expect(responses[0].response?.error).toMatch(/forced-fail:a/);
+ expect(responses[1].response?.error).toBeUndefined();
+ });
});
});
});要不要并入这个 PR 由你决定——我的理解是:混合 partition 这条路径的覆盖空白对未来的 refactor 风险最大,值得留一道护栏。其他几个也基本是零成本的断言,一次性把矩阵钉完也省得以后再想。
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Converts three copies of a sequential for (const fc of functionCalls) loop in the ACP Session prompt flow into a single runToolCalls helper that partitions calls into batches and runs consecutive Agent calls in parallel. Mirrors coreToolScheduler.partitionToolCalls / runConcurrently, respects QWEN_CODE_MAX_TOOL_CONCURRENCY (default 10), and preserves response-part ordering. Scope is conservative — only AGENT is treated as concurrent; Read/Grep/Shell/Edit still run one-at-a-time, deferring general tool parallelization to #3462. The bounded concurrency fix after bot review and the 'eventEmitter' in invocation root-cause fix both strengthened the change.
Verdict
APPROVE — correct, narrow, well-aligned with the core scheduler's pattern. Nice-to-haves (not blocking): a mixed-batch partition test, a comment in runToolCalls flagging the parallel-requestPermission UX caveat, and a PR-description update to call out the eventEmitter defensive fix alongside concurrency.
Heads up: branch currently has a merge conflict with main — would you mind rebasing so we can land this?
…ent-calls # Conflicts: # packages/cli/src/acp-integration/session/Session.test.ts # packages/cli/src/acp-integration/session/Session.ts
6c161db
#3516) (#3525) * fix(core): scope StreamingToolCallParser per stream, not per Converter Issue #3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR #3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs #3516 * docs(core): clarify GC wording in per-stream context comment (copilot review) * test(core): add pipeline-level integration test for concurrent streams Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs #3516
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review
#3516) (#3525) * fix(core): scope StreamingToolCallParser per stream, not per Converter Issue #3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR #3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs #3516 * docs(core): clarify GC wording in per-stream context comment (copilot review) * test(core): add pipeline-level integration test for concurrent streams Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs #3516
* fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…enLM#3590, QwenLM#3505, QwenLM#3467) (#113) * fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com>
…wenLM#3574) (#115) * fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): run Agent tool calls concurrently + graceful degrade (QwenLM#3463) Backport of upstream QwenLM#3463. When the model returns multiple Agent tool calls in a single turn, ACP Session was executing them sequentially in a for-loop, multiplying latency by sub-agent count. - Add private runToolCalls() helper that mirrors coreToolScheduler's partition logic: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); other tools form sequential batches. - Replace 2 for-loops in Session.ts with runToolCalls() calls. - Switch the AgentTool eventEmitter guard from key-presence check to truthy check (commit 651979c) — the key-presence check passed for { eventEmitter: undefined } and crashed inside SubAgentTracker.setup. Note: upstream replaced 3 for-loops; our fork only had 2 in those code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): support SSE and HTTP MCP servers in ACP mode In ACP mode, the Mcp server list sent by the IDE client can include SSE (type: "sse") and HTTP (type: "http") transports, but the previous implementation only handled stdio servers via toStdioServer(). Non-stdio servers were silently skipped (continue), so any SSE/HTTP-configured MCP server would never be registered. Changes: - Add toSseServer() helper: detects type=="sse" servers and maps them to MCPServerConfig(url=..., headers=...) - Add toHttpServer() helper: detects type=="http" servers and maps them to MCPServerConfig(httpUrl=..., headers=...) - Refactor newSessionConfig() loop to handle all three transport types - Declare mcpCapabilities: { sse: true, http: true } in agentCapabilities so IDE clients know this agent supports these transports without needing a transparent proxy - Export the three helper functions for unit testing Tests: - Unit tests for toStdioServer / toSseServer / toHttpServer helpers (type discrimination, mutual exclusion) - Integration-style tests for QwenAgent.initialize() mcpCapabilities - Integration-style tests for newSession() with SSE/HTTP MCP servers, verifying MCPServerConfig is constructed with the correct arguments (url vs httpUrl, headers passthrough, empty-headers → undefined) Fixes QwenLM#3472 --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com> Co-authored-by: LaZzyMan <zeusdream7@gmail.com>
* fix: backport upstream trivial bug fixes (QwenLM#3499, QwenLM#3630, QwenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): scope StreamingToolCallParser per-stream (QwenLM#3525) Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): strip thinking blocks from history on model switch (QwenLM#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model. * fix(core): reject truncated subagent write_file calls (QwenLM#3505) Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): prevent malformed permission rules from becoming tool-wide catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): preserve reasoning_content during session resume and active sessions (GH#3579) * test(config): drop fork-incompatible QwenLM#3304 strip-thoughts test The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): run Agent tool calls concurrently + graceful degrade (QwenLM#3463) Backport of upstream QwenLM#3463. When the model returns multiple Agent tool calls in a single turn, ACP Session was executing them sequentially in a for-loop, multiplying latency by sub-agent count. - Add private runToolCalls() helper that mirrors coreToolScheduler's partition logic: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); other tools form sequential batches. - Replace 2 for-loops in Session.ts with runToolCalls() calls. - Switch the AgentTool eventEmitter guard from key-presence check to truthy check (commit 651979c) — the key-presence check passed for { eventEmitter: undefined } and crashed inside SubAgentTracker.setup. Note: upstream replaced 3 for-loops; our fork only had 2 in those code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(acp): support SSE and HTTP MCP servers in ACP mode In ACP mode, the Mcp server list sent by the IDE client can include SSE (type: "sse") and HTTP (type: "http") transports, but the previous implementation only handled stdio servers via toStdioServer(). Non-stdio servers were silently skipped (continue), so any SSE/HTTP-configured MCP server would never be registered. Changes: - Add toSseServer() helper: detects type=="sse" servers and maps them to MCPServerConfig(url=..., headers=...) - Add toHttpServer() helper: detects type=="http" servers and maps them to MCPServerConfig(httpUrl=..., headers=...) - Refactor newSessionConfig() loop to handle all three transport types - Declare mcpCapabilities: { sse: true, http: true } in agentCapabilities so IDE clients know this agent supports these transports without needing a transparent proxy - Export the three helper functions for unit testing Tests: - Unit tests for toStdioServer / toSseServer / toHttpServer helpers (type discrimination, mutual exclusion) - Integration-style tests for QwenAgent.initialize() mcpCapabilities - Integration-style tests for newSession() with SSE/HTTP MCP servers, verifying MCPServerConfig is constructed with the correct arguments (url vs httpUrl, headers passthrough, empty-headers → undefined) Fixes QwenLM#3472 * fix(openai): when samplingParams is set, pass it through verbatim Previously pipeline.ts always hardcoded max_tokens as the output-token parameter name on the OpenAI-compatible path, falling back from samplingParams.max_tokens to request.config.maxOutputTokens to provider defaults. This broke GPT-5 / o-series on OpenAI and Azure OpenAI, which require max_completion_tokens and reject max_tokens with a 400 error. Fix: when the user provides samplingParams explicitly, treat it as the complete source of truth for the wire shape and pass its keys through verbatim. No client-injected defaults, no request fallbacks, no hardcoded parameter names. The user describes what the provider wants; the client trusts them. When samplingParams is absent, the historical default behavior (request fallback through temperature/top_p/.../max_tokens plus provider defaults) is preserved unchanged — existing users see no difference. Concretely, users can now set any of: samplingParams: { max_tokens: 4096 } # GPT-4 / Qwen / DeepSeek samplingParams: { max_completion_tokens: 4096 } # GPT-5 / o-series samplingParams: { reasoning_effort: 'medium' } # future knobs without waiting for a qwen-code release that adds model-specific branches. Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com> --------- Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com> Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com> Co-authored-by: Yuchen Fu <fuyuchen0904@163.com> Co-authored-by: LaZzyMan <zeusdream7@gmail.com> Co-authored-by: Gordon Lam (SH) <yeelam@microsoft.com>
QwenLM#3516) (QwenLM#3525) * fix(core): scope StreamingToolCallParser per stream, not per Converter Issue QwenLM#3516 reports subagent failures with `Model stream ended with empty response text` whose real root cause is concurrent streams racing on a single shared tool-call parser. Architecture before this change: Config (singleton) └── contentGenerator (OpenAIContentGenerator) └── ContentGenerationPipeline └── OpenAIContentConverter └── streamingToolCallParser ← shared! Any caller of `Config.getContentGenerator()` — foreground turns, fork subagents, `run_in_background: true` subagents, ACP concurrent Agent calls (PR QwenLM#3463) — ends up using the same parser instance. When two streams run concurrently, `processStreamWithLogging`'s stream-start `resetStreamingToolCalls()` wipes the other stream's in-flight buffers, and their chunks interleave at `index: 0`, producing corrupt JSON like `{"file_path": "/A{"file_path": "/B...` that even jsonrepair cannot salvage. The corrupted tool calls are dropped entirely and the stream surfaces upstream as `NO_RESPONSE_TEXT`. Fix: move parser state from Converter instance field into per-stream local state. - Add `ConverterStreamContext` and `createStreamContext()` factory on `OpenAIContentConverter`. Each call returns a fresh context holding its own `StreamingToolCallParser`. - `convertOpenAIChunkToGemini(chunk, ctx)` now takes the context as an explicit arg; all internal parser calls route through it. - `ContentGenerationPipeline.processStreamWithLogging` creates one context at stream entry and passes it to every chunk conversion. - Drop `OpenAIContentConverter.streamingToolCallParser` field. - Drop `resetStreamingToolCalls()` — the context has stream-local lifetime, no manual reset needed. The two call sites in the pipeline (stream entry and error path) are removed. Tests: - Replace the `resetStreamingToolCalls` suite with a `createStreamContext` suite asserting that distinct contexts are independent and writes to one never leak into the other. - Add a regression test simulating two concurrent streams with interleaved chunks through the same Converter instance; both tool calls close cleanly with correct arguments and ids. - All existing single-stream tests updated to obtain a context via `createStreamContext()` and pass it through to chunk conversion. - `pipeline.test.ts` mocks updated accordingly. packages/core test suite: 841 passed. No stale references to `resetStreamingToolCalls` or the private parser field remain. Refs QwenLM#3516 * docs(core): clarify GC wording in per-stream context comment (copilot review) * test(core): add pipeline-level integration test for concurrent streams Complements the unit tests in converter.test.ts by driving the real ContentGenerationPipeline + real OpenAIContentConverter (no mocks on converter) through two streams that interleave on the event loop via `setImmediate`-paced async generators. Two scenarios: 1. Happy path — two concurrent executeStream invocations with their own tool-call chunks. Assert each stream emits its own function call with the correct id and args (not cross-contaminated from the sibling stream). 2. Error isolation — one stream hits `error_finish` mid-flight while a sibling stream is still accumulating tool-call chunks. Assert the sibling's function call still emits cleanly, covering the removed `resetStreamingToolCalls()` call in the error path of processStreamWithLogging. Verified as a positive control: with the per-stream context fix reverted (origin/main state), both tests fail with exactly the bug shape users reported — one stream's function call is either overwritten by the other's id/args, or is swallowed entirely when the sibling stream's error path wipes the shared parser buffer. Refs QwenLM#3516
…#3463) * fix(cli): run ACP Agent tool calls concurrently (QwenLM#2516) When the model returns multiple Agent tool calls in a single turn, the ACP Session previously executed them sequentially in a plain for-loop, multiplying latency by the number of sub-agents spawned. Mirror the partition logic in coreToolScheduler.partitionToolCalls: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); any other tool forms its own sequential batch so the model's implicit ordering is preserved. Response-part ordering still matches the original functionCalls order. Add a focused test that uses controllable deferred executes to prove both Agent calls start before either resolves, and that the fed-back functionResponse ordering is stable regardless of resolution order. * Address PR QwenLM#3463 review: bound concurrency + robust test timing Two issues raised by the /review bot: 1. The raw Promise.all fan-out bypassed the bounded-concurrency guard that coreToolScheduler applies via QWEN_CODE_MAX_TOOL_CONCURRENCY. Replaced with an inline runBounded helper that mirrors core's runConcurrently (Promise.race on a bounded executing set, default cap 10), keeping in-order result collection. 2. The concurrency test used a 10-iteration microtask yield loop before asserting both execute() spies had been invoked. That's fragile — runTool's pre-execute path (build → getDefaultPermission → evaluatePermissionRules → permission branch → PreToolUseHook) has more await boundaries than 10 ticks guarantees, and the CI run reported call-a still at 0 invocations at the assertion point. Reworked the test to wait on an explicit `called` deferred that resolves *inside* the execute() mock body. Under sequential behaviour only one `called` would ever fire → `Promise.all([called-a, called-b])` deadlocks → vitest's per-test timeout surfaces the regression. Under the fix both fire before either result resolves. * fix(acp): degrade gracefully when AgentTool invocation has no eventEmitter The concurrency test for QwenLM#2516 timed out on CI with "Test timed out in 5000ms" after the `await Promise.all([called-a, called-b])` rewrite in the previous review-fix commit. The 5000ms wait was the symptom; the root cause is that neither `execute()` was ever being called. runTool's AgentTool branch was guarded with `'eventEmitter' in invocation`, which is a *key-presence* check. The test mock provides `{ eventEmitter: undefined, ... }` — the key exists (value undefined), the branch is entered, and `SubAgentTracker.setup` immediately throws inside `eventEmitter.on(...)`. The try/catch in runTool swallows the throw and returns an error response, so `invocation.execute()` never runs, `called[id].resolve()` never fires, and the test deadlocks. The earlier review commit (705102aff) interpreted the CI symptom as "10 microtask yields aren't enough" and rewrote the assertion around a deferred `Promise.all`. But the old test's `toHaveBeenCalledTimes(1)` failure with 0 invocations was already the same bug — execute was never called. The new formulation just converted the visible failure from an assertion mismatch into a timeout. Switch the guard to a truthy check against `invocation.eventEmitter`. Semantics for real AgentTool are unchanged — `agent.ts:392` declares `readonly eventEmitter: AgentEventEmitter = new AgentEventEmitter()`, so production always enters the branch. The only new behavior is that incomplete invocations (or test mocks) skip SubAgentTracker setup cleanly instead of crashing. `subAgentCleanupFunctions` stays `[]`, so the cleanup forEach at the success/error paths is a no-op.
Summary
Closes #2516.
When the model returns multiple Agent tool calls in a single turn, the ACP Session previously executed them sequentially in a plain for-loop, multiplying latency by the number of sub-agents spawned.
This mirrors the partition logic in
coreToolScheduler.partitionToolCalls: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); any other tool forms its own sequential batch so the model's implicit ordering is preserved. Response-part ordering still matches the originalfunctionCallsorder.Changes
Session.runToolCalls(signal, promptId, functionCalls)that partitions calls and executes concurrent batches viaPromise.all.for (const fc of functionCalls)loop inSession.ts:518 / 753 / 934.runTool).Test plan
Session.test.ts > prompt > tool call concurrency: uses deferred-resolve executes to prove both Agent calls are invoked before either resolves, and that fed-backfunctionResponseordering matches the originalfunctionCallsorder regardless of resolution order.Session.tserror count unchanged vsmain; the pre-existing errors are unrelated export-path issues).subagent_type=exploretasks in one turn, verify tool-call updates stream concurrently in the editor.