feat(core): Workflow P2 — parallel() + pipeline() concurrent fan-out (#4721)#4947
Conversation
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Hi @LaZzyMan — thanks for this PR, the work on parallel() and pipeline() looks substantial and the direction is well-aligned (Claude Code ships dynamic workflows with concurrent fan-out too, so this is clearly the right area to build in).
However, the PR body is missing most of the required template sections. The PR template asks for:
## What this PR does— you have## What(close, but the heading should match)## Why it's needed— missing (motivation, the problem being solved)## Reviewer Test Plan— missing (How to verify, Evidence Before/After, Tested on)## Risk & Scope— you have## Scopebut missing the risk/tradeoff and out-of-scope sub-items## Linked Issues— missing (please useCloses #N/Related #Nformat)<details>Chinese translation — missing
These sections aren't bureaucratic checkboxes — they're how reviewers quickly understand motivation, verify correctness, and assess risk. Without the Reviewer Test Plan especially, reviewers can't confirm the changes work without setting up the test environment from scratch.
Could you update the PR body to follow the template? Once that's sorted, happy to continue the review.
中文说明
@lazyyman 你好——感谢这个 PR,parallel() 和 pipeline() 的工作量很大,方向也对齐(Claude Code 也已经在动态工作流中支持了并发扇出,所以这个方向是正确的)。
但是 PR 正文缺少模板中大部分必填章节。PR 模板 要求:
## What this PR does— 你写的是## What(接近,但标题需要匹配)## Why it's needed— 缺失(动机、要解决的问题)## Reviewer Test Plan— 缺失(How to verify、Evidence Before/After、Tested on)## Risk & Scope— 你写的是## Scope,但缺少风险/权衡和超出范围的子项## Linked Issues— 缺失(请使用Closes #N/Related #N格式)<details>中文说明 — 缺失
这些章节不是官僚式的勾选框——它们帮助审查者快速理解动机、验证正确性、评估风险。特别是没有 Reviewer Test Plan 的话,审查者无法在不从头搭建测试环境的情况下确认改动是否有效。
请按模板更新 PR 正文,整理好后我们继续审查。
— Qwen Code · qwen3.7-max
|
Updated the PR body to follow the template — added |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Test coverage gaps for pipeline() (not mapped to a single diff line):
pipeline()abort path has zero tests —parallel()has two (pre-aborted + mid-flight),pipeline()has none. A bug in the pipeline-specific abort wiring would go undetected.- No test exercises
parallel()+pipeline()concurrently on the same limiter. A regression that gave each its own limiter would pass all existing tests. pipeline()is not tested against the 1000-agent cap (unlikeparallel()which has a dedicated test).- No
pipeline()end-to-end test throughWorkflowTool(unlikeparallel()which has one).
— qwen3.7-plus via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
Review: APPROVE
Reviewed all 8 changed files across the P2 Dynamic Workflows implementation (parallel/pipeline primitives, 1000-agent cap, env-overridable caps, vm-realm security hardening).
Summary of analysis (no high-confidence issues found):
- Correctness: The concurrency limiter implements a clean sliding-window pattern with proper abort handling. The agent cap correctly funnels all dispatch paths (sequential, parallel, pipeline) through a single counted wrapper. Pipeline's null-sentinel and errors-as-data contracts are consistent and well-defined.
- Security: The vm-realm escape via host arrays is properly closed by per-element in-realm JSON revival. The regression tests (outer array, nested objects, pipeline results) verify the fix and were reported to FAIL against a verbatim wrapper. The per-element revival also correctly handles non-serializable values (BigInt, circular refs) without crashing siblings.
- Test coverage: 154 tests cover concurrency caps, errors-as-data, order preservation, pipeline staggering (no inter-stage barrier), mid-flight abort, 1000-cap across all launch paths, env overrides, and the vm-realm escape regression.
- Code quality: Well-documented with clear design rationale in comments. Additive changes that preserve P1 behavior. Error messages are helpful for model-authored scripts.
Checks skipped: Deterministic lint/typecheck could not run in this environment (worktree dependency setup timed out). PR description states tsc --noEmit and eslint are clean on all touched files.
A p-limit-style concurrency limiter that keeps at most `limit` thunks in flight and starts a queued thunk the instant a slot frees — so one instance can be SHARED across several fan-out calls and still hold the total in-flight count under a single cap. The existing in-repo concurrency control (memoryDiscovery.ts) is fixed-size sequential batching, not a sliding window, so this is a new primitive rather than a refactor. API: - run(thunk): schedule one thunk through the shared window; rejections propagate raw. - settleAll(thunks): batch convenience that resolves to a position-aligned Array<T|null> where a rejected thunk becomes null (errors-as-data) and the ONLY rejection is an abort of the limiter's signal — so an aborted run surfaces as a rejection rather than a silent array of nulls. Guards a non-positive-integer limit (mirrors background-tasks.ts), preserves input order, short-circuits empty input, and (with an AbortSignal) refuses to start new queued work once aborted. 13 unit tests cover the window cap, errors-as-data, order, sharing across calls, and the abort paths.
Implements the P2 phase of the dynamic-workflow port (#4721): concurrent fan-out primitives on top of P1's sequential agent(). parallel(thunks) - Runs thunks through a per-run shared sliding window (createConcurrencyLimiter, cap = max(1, min(16, cpus-2)) — the max() guards 1–2 core machines). - Resolves to a position-aligned array; a thunk that throws becomes null at its index (errors-as-data). parallel() itself only rejects on abort, so an aborted run surfaces a rejection rather than a silent array of nulls. - Rejects on a non-function element (eager promise instead of a thunk). pipeline(items, ...stages) - Parallel-of-chains: one thunk per item, all sharing the SAME window, so it is staggered (item A can be in stage 3 while item B is in stage 1) with no inter-stage barrier. Stage callbacks receive (prev, item, idx); the first stage's prev is the item itself. A stage that throws OR returns null drops that item to null and skips its remaining stages. 1000-agent-per-run cap - The orchestrator wraps this.dispatch with a counter, so EVERY agent() call — sequential, parallel, or pipeline — funnels through one chokepoint. A fan-out cannot bypass the cap; the 1001st call throws. SECURITY — vm-realm result revival (closes an uncovered escape) - vmAsync's resolve path is verbatim: it does NOT re-wrap resolved values. The host parallel/pipeline impl resolves with a HOST-realm array, so handing it to the script would reopen the T1/T8/T14 escape (out.constructor.constructor('return process')() reaches host process via the host Array.prototype chain). The vm wrapper now revives the array in-realm with JSON.parse(JSON.stringify(...)) — the same mechanism that makes `args` safe — before the script sees it. The pre-P2 escape test only probed the *Promise* (already vm-realm), not the resolved array; new tests probe the resolved array (outer + nested) and were verified to FAIL against a verbatim wrapper. Real impls are injected by the orchestrator; the sandbox keeps its throwing P1-unsupported stubs as the default when parallel/pipeline are not injected, so an un-wired sandbox still gives a clear error. Tool description updated to document the P2 surface. Full workflow suite + new orchestrator/sandbox/tool tests green; tsc + eslint clean on all touched files.
…l description Adversarial self-review (6-dimension finder fan-out) surfaced two genuine defects plus three test gaps. EAD-1 [major] — reviveInRealm did JSON.parse(JSON.stringify(WHOLE array)), so a single slot whose VALUE is non-serializable (a thunk that returns a BigInt or a circular object) threw on the entire array and REJECTED the whole parallel()/pipeline(), destroying every sibling result. That defeats errors-as-data for return values. Revive PER-ELEMENT instead: a bad slot becomes null at its index, siblings survive, and the outer array is still built in-realm so the host-process escape stays closed. Regression test: parallel([() => 'a', () => 1n, () => 'c', () => circular]) => ['a', null, 'c', null]. API-1 [major] — the WorkflowTool top-level description (passed to super()) still read "No parallel, no pipeline" while the param-schema description and the runtime both now support them. Updated to describe the P2 surface (parallel/pipeline, ≤16 in flight, ≤1000 total). Also refreshed the now-stale "scheduled for P2" messages on the un-injected fallback stubs to an accurate "unavailable: sandbox created without an implementation" wording. Test gaps closed: - TST-1: pipeline() now has a concurrency test proving it shares the SAME per-run window as parallel (peak in-flight === cap), so a pipeline impl that bypassed the shared limiter would fail. - TST-2: pipeline() staggering is now tested — item 0 reaches stage 2 long before item 1's slow stage 1 finishes, proving there is no inter-stage barrier (a stage-by-stage barrier impl would fail the <50ms threshold). - TST-3: mid-flight abort through the orchestrator is now tested (the prior test only used a pre-aborted controller), proving parallel() rejects after dispatches start rather than resolving with a silent array of nulls. 150 workflow-suite tests green; tsc + eslint clean on touched files.
Mirror the established QWEN_CODE_MAX_BACKGROUND_AGENTS / P1 QWEN_CODE_MAX_WORKFLOW_SECONDS precedent so operators can tune the P2 caps without a code change: - QWEN_CODE_MAX_WORKFLOW_AGENTS — override the per-run 1000-agent cap. - QWEN_CODE_MAX_WORKFLOW_CONCURRENCY — override the cpu-derived min(16, cpus-2) in-flight window with an explicit integer. Both use the house resolver shape (resolveMaxConcurrentBackgroundAgents): a non-integer / <1 value is rejected with a debug warning and the default is used. The agent cap default is renamed DEFAULT_MAX_AGENTS_PER_RUN and the cap message is now built from the resolved value. Resolvers take an injectable env arg for pure unit testing. Adds resolver unit tests (default / valid / invalid) plus an integration test proving QWEN_CODE_MAX_WORKFLOW_AGENTS=3 makes a 4-thunk parallel() yield exactly 3 results + 1 null at run time. 154 workflow-suite tests green.
…t the thunk layer P2 self-review (independent adversarial fan-out) caught a real deadlock the mock-tested suite missed: the concurrency window was applied at the thunk level (parallel()/pipeline() scheduled their thunks through the shared limiter). A nested fan-out — e.g. `pipeline([items], item => parallel([...]))`, the canonical /deep-research shape — would have every outer slot held by an outer thunk awaiting an inner settleAll() whose thunks can never acquire a slot. pump() only re-runs from an in-flight thunk's finally, so the queue never drains: unrecoverable silent hang until the 30-min wall clock. On 1-3 core machines (limit = 1) a SINGLE nested call deadlocks; abort cannot break it because pump() is never re-invoked. Fix: the window throttles AGENT DISPATCHES, not orchestration thunks. The limiter now wraps `this.dispatch` inside countedDispatch, so only leaf agent() calls acquire a slot; parallel()/pipeline() compose promises freely via a plain Promise.allSettled + position-aligned null-map (settleToNullArray) and cannot deadlock when nested. This is also the correct "N agents in flight per run" semantics (the cap is about concurrent model calls, not orchestration depth) and makes abort prompt (dispatch slots free normally). The limiter's unused settleAll() is removed — its errors-as-data null-mapping + abort-reject moved into settleToNullArray in the orchestrator, where the batch semantics belong. The tool description's "≤16" is softened to "16 by default" now that QWEN_CODE_MAX_WORKFLOW_CONCURRENCY can raise it. Adds two RED-verified regression tests (nested parallel-in-pipeline and parallel-of-parallel, forced to concurrency=1) that deadlocked before the fix and now resolve in ms. 150 workflow-suite tests green; tsc + eslint clean.
…rtError consistency + doc accuracy
Round-2 adversarial review of the post-F1 code (fresh finders + skeptics) found
no new critical/major behaviour bugs (architectural convergence after the
dispatch-layer concurrency fix), but surfaced two real correctness items plus
a doc-accuracy cleanup pass.
createConcurrencyLimiter — prompt queue abort
- The limiter previously rejected queued jobs only lazily, inside pump(),
which re-runs from an in-flight thunk's .finally. So if an in-flight thunk
never settled (a buggy/hung future dispatcher), queued jobs would hang
forever even after abort. Production today never hits this because
subagent.execute always settles, but the limiter shouldn't lean on an
unenforced invariant. Now an `{once: true}` 'abort' listener drains the
queue the moment the signal fires. Adds a RED-confirmed regression test
(limit=1, in-flight = `new Promise(()=>{})`, abort → queued must reject
within 200ms).
settleToNullArray — abort error type consistency
- Was throwing `new Error('Workflow run aborted.')`, which `isAbortError()`
(utils/errors.ts) does NOT recognise — an aborted parallel/pipeline would
surface as a generic run failure. Now throws
`new DOMException('Workflow run aborted.', 'AbortError')` to match the
limiter, so the whole P2 abort path classifies uniformly.
Doc accuracy pass (review caught 5 stale strings)
- Tool descriptions accurately state the default cap is `min(16, cpus-2)`
(not a flat "16"), document both env-overrides
(QWEN_CODE_MAX_WORKFLOW_CONCURRENCY, QWEN_CODE_MAX_WORKFLOW_AGENTS), and
note that a thunk resolving to a non-JSON-serializable value also becomes
null at its index.
- makeParallelImpl docstring updated: parallel() rejects on invalid input OR
abort (not "only on abort" — that was contradicted by the array/function
validation right above).
- WorkflowTool fileoverview no longer claims "P1 sequential only".
- Orchestrator.run() comment updated to describe the actual P2 signal flow
(per-run limiter derived from abortOnTimeout, not P1's
"sandbox-level signal intentionally not exposed").
- Wall-clock rationale loses its stale "P1 sequential" framing.
151 workflow-suite tests pass (was 150 + 1 new lazy-abort regression);
tsc + eslint clean.
…u-floor + symmetric pipeline docs Round-3 adversarial review found one confirmed factual error in the round-2 doc rewrite plus three real consistency gaps. No new behaviour bugs; the architecture has converged after the round-1 dispatch-layer fix and the round-2 prompt-queue-abort + AbortError consistency. (1) wall-clock docstring: the round-2 rewrite claimed "even a long pipeline with the 1000-agent cap is bounded well under" 30 min. Arithmetically false: 1000 agents × 10-min subagent cap ÷ default 16-concurrency ≈ 10.5 hours, 20× the wall clock. Rewritten honestly: the wall clock is a 0-token-hang backstop, NOT a precise cost cap; for cost control point operators at the env-overridable per-run cap (QWEN_CODE_MAX_WORKFLOW_AGENTS) and concurrency window (QWEN_CODE_MAX_WORKFLOW_CONCURRENCY). (2) tool descriptions now show the actual default formula `max(1, min(16, cpus-2))`, including the outer max(1, ...) floor — without it, the displayed default would be -1 on a 1-CPU container even though the runtime clamps to 1. (3) tool descriptions now document the non-JSON-serializable→null rule for pipeline() as well as parallel() — they share the same reviveInRealm code path (per-element JSON round-trip), so the asymmetric docs were inaccurate. (4) settleToNullArray's AbortError comment is corrected: the round-2 commit overclaimed "uniform classification via isAbortError() at the WorkflowTool boundary". In reality the DOMException name is preserved at the HOST callsite inside the orchestrator, but vmAsync re-throws the script-visible rejection as a fresh `new Error(msg)` and the outer catch wraps it as WorkflowExecutionError — so isAbortError() at the tool boundary returns false either way. The DOMException is still useful as host-internal consistency, but the script-observability claim was wrong. Declined this round (intentional, documented): - F4/P2-R3-F2 wall-clock is plain Error, not AbortError — wall-clock IS a timeout, not an abort; semantically correct as-is. A unified abort surface is P3+ work. - R2-MIN-1 limiter listener leak — only triggers if signal outlives the limiter and never aborts; production caller is per-run and always aborts. - F6 new symbols not re-exported from index.ts — same internal-API decision as createConcurrencyLimiter. - Various test-vacuity nits — tests already cover real failure modes. 151 workflow-suite tests pass; tsc + eslint clean.
… SECURITY comment R1 review by @wenshao (4 [Suggestion] threads + 1 review-body comment). The fifth thread, [Critical] nested deadlock on a thunk-level limiter, was already caught and fixed in commit 0401ac88f by the dispatch-layer refactor — that commit was authored before the review posted, so the thread is on the pre-fix code; verified resolved by independent round-1 self-review fan-out that converged on the same finding from a different angle, and by a real-LLM E2E scenario (parallel-in-pipeline at concurrency=1 against qwen3-max). settleToNullArray observability + abort docs (T2 / T3 wenshao): - settled.map now logs the discarded rejection reason at debug level when a thunk rejects. Operators investigating a workflow that returned unexpected nulls can now disambiguate the four indistinguishable null paths (dispatch failure, 1000-cap, pipeline stage exception, non-JSON-serializable result) via the WORKFLOW debug logger; the contract to the script stays opaque. - Docstring now explicitly explains the abort-responsiveness path: the apparent Promise.allSettled "wait for all to complete" is in practice "wait for all to reach an abort-aware rejection" because the dispatch signal is threaded all the way down to subagent.execute, and the limiter's separate abort listener drains the not-yet-started queue instantly. env-override hard ceilings (T4 wenshao): - HARD_MAX_AGENTS_PER_RUN_CEILING = 10000 caps QWEN_CODE_MAX_WORKFLOW_AGENTS. - HARD_MAX_CONCURRENCY_CEILING = 64 caps QWEN_CODE_MAX_WORKFLOW_CONCURRENCY. - Both clamp with a debug warning rather than silently dropping the override. - Two RED-verified tests cover the over-ceiling clamp path (and a just-under value preserved). - Not a security issue (env is operator-controlled), but stops a fat-finger =999999999 from silently uncapping the run. reviveInRealm SECURITY comment (T5 wenshao): - Added a SECURITY block warning future maintainers that the revival function MUST stay inside the vm init runInContext block. JSON / Array / Object here are vm-realm globals; extracting this textually-identical helper into a host-side utility would resolve those names against the host realm and silently reopen the T1/T8/T14 escape that the revival is designed to prevent. The textual identity to a host-side util is exactly the trap. Declined this round (review-body 4 pipeline test-coverage sub-claims): - pipeline abort, parallel+pipeline shared-limiter, pipeline 1000-cap, pipeline E2E. After the dispatch-layer refactor in 0401ac88f, parallel and pipeline mechanically share the SAME countedDispatch.limiter.run path — the parallel-side abort / cap / concurrency tests cover the mechanism. Explicit per-shape sibling tests would not catch a regression that the parallel versions don't already catch. 145/145 workflow-suite tests pass; tsc + eslint clean. The 2 config tests fail locally only because the rebase pulled in #4844 (Agent Team)'s new proper-lockfile dependency which the symlinked node_modules doesn't have — CI resolves on fresh install.
81eb7b8 to
8a3f1cc
Compare
|
@wenshao reply to the review-body pipeline test-coverage comment (4 sub-claims): After the dispatch-layer refactor in Concrete mapping for each sub-claim:
If a future regression breaks pipeline specifically and not parallel, that would prove the mechanism diverged — at which point the sibling test becomes a real regression guard, not a structural duplicate. Happy to revisit if a concrete failure mode surfaces. |
R1 review summary (HEAD
|
| # | Finding | Disposition | Commit |
|---|---|---|---|
| T1 | [Critical] nested deadlock on thunk-level limiter | ✅ independently caught by in-house round-1 self-review fan-out + addressed | 71f2e98a8 (dispatch-layer refactor) |
| T2 | [Suggestion] settleAll silently discards rejection reasons | ✅ fixed — debugLogger.warn in settleToNullArray (settleAll was removed when concurrency moved to dispatch layer) |
8a3f1cc4c |
| T3 | [Suggestion] abort post-settlement comment misleads | ✅ fixed — settleToNullArray docstring now explicit about the in-flight cancellation chain (dispatch signal → subagent.execute → prompt rejection); limiter's separate abort listener drains the queue instantly |
8a3f1cc4c |
| T4 | [Suggestion] env override has no upper bound | ✅ fixed — HARD_MAX_CONCURRENCY_CEILING = 64, HARD_MAX_AGENTS_PER_RUN_CEILING = 10000; clamp + debug warning; 2 RED-confirmed tests |
8a3f1cc4c |
| T5 | [Suggestion] reviveInRealm needs SECURITY comment | ✅ fixed — SECURITY block warning future maintainers the function MUST stay inside the vm init runInContext block (textual identity to a host-side util is exactly the trap) | 8a3f1cc4c |
| RB | [review-body] 4 pipeline test-coverage gaps | ❌ declined — after the dispatch-layer refactor parallel/pipeline mechanically share the same countedDispatch.limiter.run path; per-shape tests would not catch a regression the parallel tests don't already catch (mapping in #issuecomment-4678701949) |
— |
Notable convergence: wenshao's [Critical] T1 nested-deadlock finding and the in-house round-1 self-review fan-out (5 fresh finders + 2 skeptics per finding, no shared blindspot framing with this PR's author) converged on the same architectural defect from different angles. The dispatch-layer fix is verified by two RED-confirmed regression tests (parallel-in-pipeline + parallel-of-parallel at concurrency=1) plus a real-LLM E2E (qwen3-max via DashScope, nested pipeline-in-parallel returning [[\"N1\"],[\"N2\"]] end-to-end).
Test posture: 145/145 workflow-suite tests pass; tsc + eslint clean on all touched files.
DragonnZhang
left a comment
There was a problem hiding this comment.
Re-review of new commits (81eb7b8..8a3f1cc). All 5 findings from the previous review have been properly addressed:
- Deadlock (Critical) — Fixed by moving the concurrency throttle to the dispatch layer (
71f2e98a8). Nestedparallel()/pipeline()no longer deadlock; confirmed by thenested fan-outtests at concurrency=1. - Silent discard — Fixed with
debugLogger.warninsettleToNullArray(8a3f1cc4c). - Abort comment — Fixed with explicit docstring explaining the post-settlement abort path and prompt signal-threaded rejection (
8a3f1cc4c). - No upper bound on env override — Fixed with hard ceilings (
HARD_MAX_AGENTS_PER_RUN_CEILING=10000,HARD_MAX_CONCURRENCY_CEILING=64) and clamp-with-warn semantics (8a3f1cc4c). reviveInRealmsecurity position — Fixed with explicit SECURITY comment block warning against extraction to host-side utility (8a3f1cc4c).
One new finding on the staggered-timing test (posted inline). The macOS CI failure (Test (macos-latest, Node 22.x)) is almost certainly caused by this test — GitHub's macos-14 runners have 3 CPU cores, making the concurrency limit 1.
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
[Suggestion] workflow-sandbox.ts:596 — The workflow() stub error message still says "not supported in P1. Scheduled for a later phase." while parallel() and pipeline() stubs in the same block were updated to "is unavailable: ...". Two stubs use the new phrasing, one uses the old — inconsistency within a 20-line block.
reject(new Error(
'workflow() is unavailable: nested workflow invocation is not yet supported.'
));
— qwen3.7-max via Qwen Code /review
…ng + pipeline E2E R2 review by @DragonnZhang (re-review after R1 push) and @qwen-code-ci-bot (post-fix review). Three real items addressed; one style/wording inconsistency in the bot's review body declined per the round-weighted bar. T6 [Bug] staggering test deterministically fails on macOS-14 CI (DragonnZhang) - The test asserted item 0 reaches stage 2 within 50ms while item 1's stage 1 takes 120ms. That timing assumption holds only at concurrency ≥ 2. On GitHub's macos-14 runners (3 CPU cores) cpu-derived concurrency = 1, FIFO forces all stage-1 dispatches to settle before any stage-2 starts, and the ~122ms s2-of-0 timestamp blows the threshold. The test passed on my workstation but blocks the macos-latest CI matrix — root cause of the failing `Test (macos-latest, Node 22.x)` check that downgraded the prior APPROVE. - Replaced with a deterministic gate-based assertion that does NOT depend on wall-clock thresholds: force QWEN_CODE_MAX_WORKFLOW_CONCURRENCY=2, have item 1's stage 1 block on a Promise gate that only item 0's stage 2 can release. A staggered impl completes (item 0 advances while item 1 is held); a barrier impl deadlocks (item 0's stage 2 can't start until item 1's stage 1 finishes, which can't finish until item 0 reaches stage 2). Vitest timeout catches the barrier-deadlock case; the assertion `item0ReachedStage2 === true` is timing-free. T7 [Suggestion] reviveInRealm catch silently sets null with no log (qwen-code-ci-bot) - The R1 fix added debugLogger.warn for rejected thunks in settleToNullArray, but a thunk that *resolves* to a non-JSON-serializable value (BigInt / circular object) takes a different path through reviveInRealm's catch in the vm init script. Operators with debug logging on still couldn't distinguish "rejected" from "resolved-but-unserializable" — symmetric observability was missing. The R1 audit should have caught the sibling and didn't. - Added a host-side `logRevivalFailure(idx, reason)` hook to the bridge (debugLogger.warn host-side) and call it from reviveInRealm's catch with the coerced-to-string error message. The bridge contract is preserved: only primitive strings/numbers cross back; reviveInRealm itself stays inside the vm runInContext block per the SECURITY comment. T8 [Suggestion] no pipeline() end-to-end test through WorkflowTool (qwen-code-ci-bot) - This is the SAME finding wenshao raised in his R1 review-body, which I declined on a "parallel/pipeline share mechanism — symmetric tests redundant" basis. The bot's R2 raise provides specific mechanism evidence that breaks that argument: pipeline's vm wrapper uses `callPipeline.apply(null, arguments)` and `[items].concat(stages)` to spread the variadic stage list, a code path structurally distinct from parallel's single-argument call. A regression in the vm-to-host stage forwarding would not be caught by the parallel E2E. My R1 decline was based on incomplete grep — apologies, accepting now. - Added a pipeline E2E test mirroring the parallel E2E shape: full stack drive through WorkflowTool → orchestrator → sandbox revival, asserting the chained stage results `[11, 21]`. Declined this round (review-body): - qwen-code-ci-bot's workflow() stub wording inconsistency ("not supported in P1" vs the new "is unavailable: ..." on parallel/pipeline). R2 style/ nit per the round-weighted bar; no behavioural impact. 146/146 workflow-suite tests pass; tsc + eslint clean.
R2 review summary (HEAD
|
| # | Reviewer | Finding | Disposition | Commit |
|---|---|---|---|---|
| T6 | @DragonnZhang | [Bug] staggering test fails on macos-14 (3-core → concurrency=1 → FIFO serializes) | ✅ fixed — replaced timing-based assertion with deterministic Promise gate (item 1's stage 1 blocks until item 0 reaches stage 2 and releases); barrier impl deadlocks, staggered impl completes. Forces QWEN_CODE_MAX_WORKFLOW_CONCURRENCY=2 for the test. |
a87f589be |
| T7 | @qwen-code-ci-bot | [Suggestion] reviveInRealm catch silently sets null with no log — distinct from settleToNullArray rejection path | ✅ fixed — bridge logRevivalFailure callback; revival catch logs via host debugLogger.warn. Symmetric sibling of the R1 settleToNullArray logging fix. |
a87f589be |
| T8 | @qwen-code-ci-bot | [Suggestion] no pipeline() E2E through WorkflowTool — pipeline vm wrapper is structurally different (apply(null, arguments) + [items].concat(stages)) |
✅ fixed — accepting on new mechanism evidence. This is the same finding @wenshao raised in R1 review-body that I declined on incomplete-grep "symmetric mechanism" argument. | a87f589be |
| RB | @qwen-code-ci-bot | [Suggestion] workflow() stub wording inconsistency ("not supported in P1" vs "is unavailable: ...") | ❌ declined — R2 style/wording per round-weighted bar; no behavioural impact. | — |
Self-postmortem: the staggering test's CI failure should have been caught locally. I had a deterministic gate-based replacement designed and consciously deferred it pending self-review feedback, then trusted my in-house finder's "not flaky on my workstation" verdict — exactly the inherited-blindspot trap an external reviewer is the only real check against. @DragonnZhang catching it on macos-14 is the right kind of independent verification. Similarly, T8 was @wenshao's R1 finding I declined too quickly; @qwen-code-ci-bot's R2 raise with mechanism evidence (pipeline-specific arguments-based forwarding) showed my decline rationale was based on incomplete grep — accepting now.
Test posture: 146/146 workflow-suite tests pass; tsc + eslint clean on all touched files. The macOS CI check that downgraded the prior approval should pass on this commit.
…ival error coercion
Post-R2 /simplify pass. Two findings that are pure cleanup of code added in
the R2 commit, with no scope drift:
(1) The R2 fix added `createDebugLogger('WORKFLOW')` to workflow-sandbox.ts,
duplicating the identical call in workflow-orchestrator.ts:21. Export the
sandbox-side instance and import it in orchestrator — single source of
truth, one fewer logger object retained for process lifetime. Direction is
natural (orchestrator already imports from sandbox; the reverse would be
circular).
(2) The reviveInRealm catch coercion `String((e && e.message != null) ?
e.message : e)` collapses to `String(e?.message ?? e)`. The truthy/null
distinction the original drew (treating empty-string message different from
the toString fallback) was not meaningful for a debug log line. Same
behaviour for any realistic error; less noise to read.
Deferred per the same self-review discipline that the R2 commit message
documented:
- `withEnv` helper to dedupe the 6-line env-var save/restore boilerplate
(now 3 sites in workflow-orchestrator.test.ts): real ~10 LOC win but
touches 2 pre-existing tests, out of R2 scope.
- Rename `bridge.logRevivalFailure` to a generic `bridge.warn(category,
msg)` for future vmAsync silent-reject logging: speculative; per the
altitude analysis, "zero rename cost when the second consumer arrives"
means deferring loses nothing.
146/146 workflow-suite tests pass; tsc + eslint clean.
🧪 Local runtime verification (built CLI + real model, interactive TUI via tmux) — ✅ PASSAs a complement to the unit suite and the author's harness-level E2E, I verified this PR at the user-facing surface: built the PR head ( Each scenario was a prompt asking the driving model to call the Environment: macOS (arm64), Node v22.22.2, Scenarios & observations
Key evidence (pane captures)Scenario 4 — the security-critical revival (host-realm escape stays closed, content survives): { "runId": "wf_...", "phases": [], "logs": [], "result": { "probe": "undefined", "first": "OK" } }Scenario 7 — agent-cap env override at the surface: Scenarios 3 & 8 — the two operator-side WARN lines (added in R2) fire and are distinguishable in Notes for the record
Conclusion: every probed behavior matches the PR description and code comments — including the adversarial probes. No regressions observed at the runtime surface. From a runtime-verification standpoint, LGTM for merge. 🇨🇳 中文版(点击展开)🧪 本地运行时验证(构建后的 CLI + 真实模型,tmux 交互式 TUI)— ✅ 通过作为对单测套件与作者 harness 级 E2E 的补充,我在用户可见的表面验证了本 PR:本地构建 PR head( 每个场景都是让驱动模型用指定脚本调用 环境: macOS (arm64)、Node v22.22.2、独立 socket 的 tmux、临时项目目录、 场景与观察
关键证据(tmux pane 截取)场景 4 —— 安全关键的复活路径(宿主域逃逸保持封闭,内容完好): { "runId": "wf_...", "phases": [], "logs": [], "result": { "probe": "undefined", "first": "OK" } }场景 7 —— agent 上限环境变量在表面生效: 场景 3 & 8 —— R2 中新增的两条运维侧 WARN 日志均触发且可区分(位于 备注
结论: 所有探测点的行为均与 PR 描述及代码注释一致——包括对抗性探针。运行时表面未观察到任何回归。从运行时验证角度,LGTM,可合并。 |
…'worktree'}) (#4721) (#5034) * feat(core): Workflow P3 — agent({schema, agentType, model, isolation:'worktree'}) (#4721) Adds the P3 dispatch options to the workflow runtime, completing the contract qwen-code's workflow tool matches against upstream Claude Code 2.1.168. P1/P2 stubs (workflow-sandbox.ts:508-527) are replaced with production paths routed through `SubagentManager.createAgentHeadless` so per-call model overrides go through `buildRuntimeContentGeneratorView` (provider routing), per-agent MCP servers / hooks get isolated lifecycles, and worktree-isolated subagents run against a rebound Config. - agent({agentType: 'X'}) resolves against the declarative-agents registry (#4842 + #4996) via findSubagentByName; unresolved names throw "agent({agentType}): agent type 'X' not found" verbatim from upstream. - agent({model: 'qwen3-max'}) is threaded into SubagentConfig.model so the runtime view sees it (modelConfigOverrides alone would only swap the model name within the existing provider's view). - Workflow's disallowed-tool floor [SendMessage, ExitPlanMode] is unioned with the agentType's own disallowedTools so a permissive agentType cannot re-enable them for a workflow subagent. - agent({isolation: 'worktree'}) provisions a fresh worktree via GitWorktreeService.createUserWorktree (slug agent-<7hex>, mirrors AgentTool 1849-1963), rebinds cwd/getTargetDir/getFileService/ getWorkspaceContext on a prototype-chained Config override, and on completion auto-removes the worktree if clean or preserves the path + branch (appended to the result string) when the subagent left changes. Parent-dirty trees are refused with a clear error to avoid silently running the subagent against a stale HEAD. - agent({isolation: 'remote'}) throws "agent({isolation:'remote'}) is not available in this build" verbatim (upstream 2.1.168 parity). - agent({schema: S}) injects a per-call SyntheticOutputTool (existing tools/syntheticOutput.ts, AJV-backed) into a fresh per-subagent ToolRegistry built via rebuildToolRegistryOnOverride, then watches AgentEventEmitter TOOL_CALL/TOOL_RESULT events for `structured_output` invocations. A successful call's args are captured as the dispatch return value (object, not string); after two failed attempts the third failure aborts the dispatch and throws "subagent completed without calling StructuredOutput (after 2 in-conversation nudges)" verbatim. No agent-core.ts changes — the entire 2-nudge counter lives in the dispatch layer so the shared subagent loop is unaffected. The sandbox's agent() wrapper now revives per-call object returns into the vm realm (JSON round-trip inside the vm runInContext block), closing the same T1/T8/T14 host-prototype-escape vector that P2's per-element revival closed for parallel/pipeline. Two new sandbox security tests (constructor-chain probe + non-JSON-serializable collapse) regress this. WorkflowAgentResult widens from `string` to `string | object`; the fast-path (no agentType/model/isolation/schema) is preserved byte-for-byte to keep P1/P2 zero-overhead. Tests: 159 workflow-suite tests + 217 adjacent (subagents / syntheticOutput / agent-override) all green. Real-LLM E2E follow-up planned (mirroring P2's 13/13 qwen3-max validation). Related #4721 (parent design — multi-phase, not closed by this PR) Related #4732 (P1 merged) #4947 (P2 merged) #4842 #4996 (declarative agents) * chore(core): P3 self-review R1 — align worktree suffix wording + 6 test gaps R1 of pre-push adversarial self-review on PR #5034 surfaced 6 confirmed findings across 6 diverse lenses (correctness / security / reuse-altitude / self-invariant / consumer-breakage / test-gaps). Each finding faced 2 independent skeptics defaulting to refuted=true; 6 survived majority challenge. Source code: - Worktree-preserved suffix wording now matches AgentTool's formatWorktreeSuffix (agent.ts:1700-1719) verbatim, including the `git worktree add <path> <branch>` recovery hint for the directory- removed-but-branch-preserved race. Test gaps closed: - schema-mode success after 1 nudge (round-2 args captured) - schema-mode success after 2 nudges (round-3 args captured) - schema-mode + agentType together — floor disallowedTools still unioned - schema-mode caller-abort takes priority over the StructuredOutput terminal error (signal.aborted check at workflow-orchestrator.ts:489-490) - override path dispose() runs in finally on the success path - override path dispose() runs in finally on the terminate-mode-error path Declined R1 finding: negative tests for invalid opt types (schema/model/ agentType passed null/number/empty-string). Adding upfront type validation is scope creep — upstream does not, P1/P2 do not, and the workflow tool is model-authored where these inputs are extremely unlikely. Existing AJV / SubagentManager downstream errors are descriptive enough. Will revisit if R2 makes a stronger case. 166/166 tests pass (workflow suite + adjacent + workflow-orchestrator). typecheck + lint clean across packages/core, packages/cli, integration-tests, sdk, webui. * chore(core): P3 self-review R2 — vm-realm opts revive + error-msg sanitize + 12 tests R2 of pre-push adversarial self-review on PR #5034. 6 diverse-lens finders (60 agents, ~2.5M tokens, 24 min) over the R1-fix-applied code, with 2 independent skeptics defaulting to refuted=true. 12 confirmed survivors after adversarial verify; decisions below. Security (FIX): - agent() wrapper in workflow-sandbox.ts now JSON-revives agentOpts inside the vm runInContext block BEFORE passing them to the host dispatch. Closes a Proxy/inherited-getter escape that P3 introduced along with the user-supplied schema object: a script could have wrapped agentOpts.schema in a Proxy whose getter ran host-side code during SyntheticOutputTool construction / AJV compile. Same mechanism as args / parallel-result revival. - runOverridePath now sanitizes opts.agentType through sanitizeForErrorMessage() (control chars → space) before interpolation into the "agent type 'X' not found" error message. Prevents a model-authored agentType containing CRLF / NUL from fragmenting a single-line error across log records / OTLP fields. Reuse-altitude (FIX): - Added JSDoc block to WorkflowWorktreeIsolation interface documenting each field's role for cleanup. Test gaps (FIX, 12 new tests): - agentType control-char sanitization regression - dispose() runs in finally when subagent.execute throws - isolation:'worktree' provision error branches (5): nested parent / git unavailable / not a git repo / parent dirty / createUserWorktree returns failure - isolation:'worktree' cleanup branches (3): removeUserWorktree fails / branchPreserved race / removeUserWorktree throws — each preserves the worktree (or branch) with the right user-facing suffix - combinations (2): model + isolation:'worktree' threads model AND provisions worktree; schema + isolation:'worktree' returns structured payload verbatim (preserved suffix only on string return) Test infrastructure: vi.mock'd GitWorktreeService at the module level (partial mock; preserves the existing exports the unrelated worktreeCleanup.ts depends on) with a per-test beforeEach reset. Declined R2 findings (kept the R1 line): - [major] Schema parameter upfront validation: same scope-creep decline as R1. Upstream doesn't do it; AJV's downstream error is descriptive enough. - [major] Worktree provision extracted to shared util with AgentTool: agreed in principle but out of P3 scope. A separate refactor PR should land that with AgentTool maintainers in the loop. 178/178 tests pass (workflow + adjacent suites). typecheck + lint clean across packages/core, packages/cli, integration-tests, sdk, webui. * fix(core): address wenshao R1+R2 review on Workflow P3 (PR #5034) Round 1 (15:41) + Round 2 (17:24) review from wenshao surfaced 7 inline findings across schema-mode dispatch correctness, worktree cleanup coverage, and error attribution. Each fix is paired with a regression test that was RED before the change landed. T0 [Critical] Worktree leak when schema setup throws after provision workflow-orchestrator.ts: outer try MOVED to start immediately after provisionWorkflowWorktree. Previously the try opened only after createSchemaConfigOverride / createSchemaModeState / signal listener attachment — so any throw in those three (broken MCP server during the per-call ToolRegistry rebuild was the trigger wenshao cited) orphaned the just-provisioned worktree under .qwen/worktrees/. Test: "isolation:'worktree' + schema setup throws → worktree is still cleaned up" — simulates createToolRegistry failure during createSchemaConfigOverride; asserts removeUserWorktree was called. T1 [Critical] / T4 [H1] agentType + schema silently dead-ended workflow-orchestrator.ts: schema-mode augmented config now (a) appends ToolNames.STRUCTURED_OUTPUT to baseConfig.tools when the allowlist is restricted (no '*' and doesn't already contain it), so prepareTools / getFunctionDeclarationsFiltered doesn't filter structured_output out of the subagent's surface; (b) preserves the resolved agentType's persona by APPENDING the schema-contract instruction block instead of replacing the systemPrompt outright. Replace remains only on the ephemeral no-agentType path where baseConfig.systemPrompt IS WORKFLOW_SUBAGENT_SYSTEM_PROMPT (schema variant is its strict superset; avoids two near-identical prompts). Tests: structured_output appears in the allowlist alongside the agentType's existing tools; persona prompt is contained in the effective systemPrompt. T2 [Suggestion] / T5 [M1] Parent-abort listener leaked per schema call workflow-orchestrator.ts: named listener stored at outer scope, removed in the outer finally regardless of how the dispatch ended. Previous `{ once: true }` only auto-removed on actual parent abort; the happy-path schema dispatch — success capture / 3-failure abort fires the CHILD controller without the parent ever aborting — left the listener stuck on the per-run signal. With N schema calls per workflow N listeners + N child-controller closures accumulated. Test: 5 sequential schema dispatches over the same parent signal end with zero live listeners. T6 [M2] Terminate mode misdiagnosed as nudge exhaustion workflow-orchestrator.ts: schema path now distinguishes terminateMode before attributing failure to schema mode. TIMEOUT / MAX_TURNS / ERROR throw the existing "did not complete (terminate mode: X)" message that the non-schema path uses. Only the actual schema-failure cases produce schema wording, and those are split: attempts > 2 keeps the upstream-verbatim "(after 2 in-conversation nudges)" wording; attempts === 0 throws an accurate "no validation attempt — model produced plain-text content" instead of misleadingly citing nudges that never happened. (The existing 0-call test was updated to match the new accurate message; the 3-failure test retains the verbatim wording.) Tests: parametric over TIMEOUT/MAX_TURNS/ERROR asserting "did not complete"; companion test pinning the verbatim wording to the 3-failure path. T3 [Suggestion] Schema-mode JSON revival sentinel — clarified workflow-sandbox.ts: added a block comment documenting that the JSON-round-trip + null-on-throw is a SECURITY backstop (errors-as-data convention from parallel/pipeline) rather than a contract path — unreachable in production schema mode because the host return is LLM tool_call args, always JSON-serializable. No behavior change. Tests: 75/75 orchestrator + 111/111 sandbox/tool/limiter green. typecheck + lint clean across packages/core and packages/cli. R1+R2 self-review commits (e1c5ec7 / 62624a9) precede this commit on the same branch — they predate wenshao's review and address distinct findings; reviewer L1 (worktree-lifecycle unit coverage) is already closed by R2's 11 worktree tests.
What this PR does
Implements phase P2 of the Dynamic Workflows port, building on the merged P1 (#4732). It adds concurrent fan-out primitives on top of P1's sequential
agent():parallel(thunks)runs thunks through a shared per-run sliding window (≤16 agents in flight) and resolves to a position-aligned array where a thunk that throws becomesnullat its index (errors-as-data), withparallel()itself rejecting only on abort.pipeline(items, ...stages)is parallel-of-chains — one chain per item, all sharing the same window, staggered with no inter-stage barrier so item A can be in stage 3 while item B is still in stage 1; stage callbacks receive(prev, item, idx)with the first stage'sprevbeing the item itself, and a stage that throws or returnsnulldrops that item tonulland skips its remaining stages. A 1000-agent-per-run cap funnels everyagent()call (sequential, parallel, or pipeline) through one wrapped dispatch so a fan-out cannot bypass it. Both caps are env-overridable (QWEN_CODE_MAX_WORKFLOW_CONCURRENCY,QWEN_CODE_MAX_WORKFLOW_AGENTS), mirroring P1'sQWEN_CODE_MAX_WORKFLOW_SECONDSand the existingQWEN_CODE_MAX_BACKGROUND_AGENTS.A security-critical detail:
vmAsync's resolve path is verbatim (it does not re-wrap resolved values), so handing the host-realm result array to the script would reopen the host-realm escape (out.constructor.constructor('return process')()walks the hostArray.prototypechain to the hostFunctionconstructor). The vm wrapper now revives the array per-element inside the vm realm withJSON.parse(JSON.stringify(...))— the same mechanism that makesargssafe — so the script only ever sees vm-realm prototypes, and one non-serializable slot becomesnullrather than crashing the whole batch.Why it's needed
P1 shipped only sequential
agent(), so a workflow that needs to run N independent subagents had to await them one at a time — the dominant cost in a multi-agent workflow is wall-clock, and sequential dispatch leaves all the parallelism on the table. Dynamic workflows in the upstream tool (Claude Code) expose concurrent fan-out (parallel/pipeline) precisely because real workflows are map-reduce shaped: fan out research/extraction/verification across many items, then aggregate. P2 unlocks that map-reduce capability while keeping the resource ceilings (concurrency window + total-agent cap) that make it safe to expose to a model that authors the script on the fly. The errors-as-data contract (a failed subagent becomesnull, not a thrown rejection that loses every sibling result) is what lets a script reason about partial failure the way the upstream/deep-researchworkflow does.Reviewer Test Plan
How to verify
Unit + integration (154 tests across the workflow suite):
Real-LLM E2E — drives the built orchestrator + sandbox against a real model (qwen3-max via the DashScope OpenAI-compatible endpoint), so
agent()calls are real subagent dispatches fanning out throughparallel()/pipeline(). 13/13 checks, 14 live model calls:tsc --noEmitandeslintare clean on all touched files.Evidence (Before & After)
N/A — this is a non-user-visible backend change (workflow execution primitives). The verifiable evidence is the test output above.
Tested on
Environment
Local:
npx vitest(unit/integration) + a standalone Node harness importing the builtpackages/core/distorchestrator with a dispatch that calls the real model API (no full CLI bundle needed; the dispatch contract(prompt) => Promise<string>is identical to whatcreateProductionDispatchprovides viaAgentHeadless.getFinalText()).Risk & Scope
Functionconstructor. Closed by per-element in-realm JSON revival, with a regression test that was verified to FAIL against a verbatim wrapper. Token-cost amplification from fan-out is bounded by the 16-concurrency window + 1000-agent cap, both env-overridable.agent({schema})), P4 (/workflowsUI + progress), P5 (budget), P6 (resume) remain follow-ups per Feature Request: Port Dynamic Workflows / Ultracode from Claude Code 2.1.160 #4721. The E2E exercises P2's concurrency/revival/cap against a real model via a dispatch-level call; it does not re-exercise the fullAgentHeadlesstool-use loop (a P1 concern already covered). Windows/Linux verified by CI only.isWorkflowsEnabled()gate (off by default); no change to/swarmor Agent Team.Linked Issues
Related #4721 (parent design — multi-phase, not closed by this PR)
Related #4732 (merged P1 this builds on)
中文说明
这个 PR 做了什么
实现 Dynamic Workflows 移植的 P2 阶段,基于已合并的 P1(#4732)。在 P1 的串行
agent()之上增加并发扇出原语:parallel(thunks)把 thunks 送进一个每次运行共享的滑动窗口(同时在飞 ≤16 个 agent),解析为位置对齐的数组——某个 thunk 抛错时该位变成null(errors-as-data),而parallel()本身只在 abort 时 reject。pipeline(items, ...stages)是 parallel-of-chains:每个 item 一条链,所有链共享同一个窗口,交错执行、阶段之间无 barrier,所以 item A 可以在 stage 3 而 item B 还在 stage 1;阶段回调接收(prev, item, idx),第一阶段的prev就是 item 本身,某阶段抛错或返回null会把该 item 降为null并跳过其余阶段。1000-agent-per-run 上限让每次agent()调用(串行、parallel、pipeline)都经过同一个被包裹的 dispatch,因此扇出无法绕过它。两个上限都可用环境变量覆盖(QWEN_CODE_MAX_WORKFLOW_CONCURRENCY、QWEN_CODE_MAX_WORKFLOW_AGENTS),对齐 P1 的QWEN_CODE_MAX_WORKFLOW_SECONDS和既有的QWEN_CODE_MAX_BACKGROUND_AGENTS。一个安全关键细节:
vmAsync的 resolve 路径是原样透传的(不会重新包裹解析值),所以把宿主域的结果数组交给脚本会重新打开宿主域逃逸(out.constructor.constructor('return process')()沿宿主Array.prototype链到达宿主Function构造器)。vm 包装器现在在 vm 域内用JSON.parse(JSON.stringify(...))逐元素复活该数组——与让args安全的机制相同——因此脚本只会看到 vm 域的原型,且单个不可序列化的槽位会变成null而非拖垮整批。为什么需要
P1 只发布了串行
agent(),所以一个需要运行 N 个独立子 agent 的工作流只能一个个 await——多 agent 工作流的主要成本是 wall-clock,串行 dispatch 把所有并行度都浪费了。上游工具(Claude Code)的动态工作流暴露并发扇出(parallel/pipeline)正是因为真实工作流是 map-reduce 形状:把研究/抽取/验证扇出到多个 item,再聚合。P2 解锁了这种 map-reduce 能力,同时保留让它能安全暴露给即时编写脚本的模型的资源上限(并发窗口 + 总 agent 上限)。errors-as-data 契约(失败的子 agent 变成null,而不是抛出一个会丢掉所有兄弟结果的 rejection)正是让脚本能像上游/deep-research工作流那样推理部分失败的关键。审查者测试计划
如何验证
单元 + 集成(workflow 套件 154 个测试):
真实模型 E2E——用真实模型(qwen3-max,经 DashScope OpenAI 兼容端点)驱动已构建的 orchestrator + sandbox,所以
agent()调用是真实的子 agent dispatch 经parallel()/pipeline()扇出。13/13 通过,14 次真实模型调用:tsc --noEmit与eslint在所有改动文件上均干净。证据(Before & After)
N/A——这是非用户可见的后端改动(工作流执行原语)。可验证的证据是上面的测试输出。
测试平台
(✅ 已测 ·⚠️ 未测,由 CI 覆盖)
环境
本地:
npx vitest(单元/集成)+ 一个独立 Node 脚本,导入已构建的packages/core/distorchestrator,用一个调用真实模型 API 的 dispatch(无需完整 CLI bundle;dispatch 契约(prompt) => Promise<string>与createProductionDispatch经AgentHeadless.getFinalText()提供的完全一致)。风险与范围
Function构造器。已通过逐元素 vm 域内 JSON 复活闭合,回归测试已验证对原样透传包装器会 FAIL。扇出带来的 token 成本放大由 16 并发窗口 + 1000 agent 上限约束,两者均可用环境变量覆盖。agent({schema}))、P4(/workflowsUI + 进度)、P5(budget)、P6(resume)仍是 Feature Request: Port Dynamic Workflows / Ultracode from Claude Code 2.1.160 #4721 的后续。E2E 经 dispatch 级调用对真实模型验证了 P2 的并发/复活/上限;未再次跑完整的AgentHeadlesstool-use 循环(那是已覆盖的 P1 关注点)。Windows/Linux 仅由 CI 验证。isWorkflowsEnabled()开关之后(默认关闭);不改动/swarm或 Agent Team。关联 Issue
Related #4721(父设计——多阶段,本 PR 不关闭)
Related #4732(本 PR 所基于的、已合并的 P1)