feat(core): Workflow tool P1 — minimal node:vm sandbox + sequential agent() (#4721)#4732
Conversation
📋 Review SummaryThis PR implements P1 of the Dynamic Workflows / Ultracode port — a minimal 🔍 General Feedback
🎯 Specific Feedback🔵 Low
✅ Highlights
|
Cross-check against upstream (
|
wenshao
left a comment
There was a problem hiding this comment.
Fresh review pass at HEAD e767902. CI is green and this is not a self-PR, so no verdict downgrade applies.
Three new Critical issues are posted inline, plus three Suggestions. The headline is a host-realm escape on the success path of agent() (the always-on twin of the thrown-Error leak already flagged at workflow-sandbox.ts:293) — I confirmed it with a PoC that read process.env from inside a sandbox script with no error thrown.
For the record, the three earlier Critical inline comments still reproduce at this HEAD and were each re-verified by PoC — they block merge alongside the new findings:
- sandbox escape via thrown host
Errorobjects (workflow-sandbox.ts:293) deepNullProtosevering array prototypes sofor...of/.map/ spread on arrayargsthrow (workflow-sandbox.ts:175)- eager value-export of
WorkflowToolvsexport typesiblings (index.ts:119)
Good things confirmed: typecheck + lint + the 80 new tests all pass, the build is green, the node:vm Math/Date in-realm init is genuinely not subvertible, __proto__ args keys do not pollute host Object.prototype, and the anti-recursion guard (WORKFLOW in EXCLUDED_TOOLS_FOR_SUBAGENTS) holds on the tools: ['*'] path.
— claude-opus-4-8 via Claude Code /qreview
wenshao
left a comment
There was a problem hiding this comment.
Also noted (Nice to have, not posted as inline):
- 6 new files missing
@licenseheaders (workflow-sandbox.ts,workflow-orchestrator.ts,workflow-prompts.ts, and 3 test files). Onlyworkflow.tsand the two config test files have the standard header. - Stale comment in
config.workflows.test.ts:10andconfig.workflow-registration.test.ts:10: referencesconfig-session-env.test.tswhich does not exist in the codebase.
— qwen3.7-max via Qwen Code /review
E2E test report — real model (
|
| # | Behaviour | Result |
|---|---|---|
| 0 | Default-off gate (no env var) | ✅ tool absent from init.tools |
| 1 | QWEN_CODE_ENABLE_WORKFLOWS=1 gate |
✅ workflow in init.tools |
| 2 | Sequential phase() + 2× agent() + structured return |
✅ runId, phases, subagent dispatch, llmContent unwrap all verified end-to-end |
| 3 | Date.now() |
✅ throws — binary §axO verbatim |
| 4 | Math.random() |
✅ throws — binary §sxO verbatim |
| 5 | agent({schema}) |
✅ throws — "scheduled for P3" |
| 6 | parallel(...) |
✅ throws — "scheduled for P2" |
| 7 | pipeline(...) |
✅ throws — "scheduled for P2" |
All 7 P1-scope behaviours verified against a real model on real hardware. The 80 unit tests + 9920 core regression already in the PR cover the implementation seams; this report covers the integration path the user actually sees: model → tool description → tool_use → sandbox → subagent dispatch → tool_result → main-loop summary.
PR moving from draft → ready for review.
…ening (PR #4732 R1) Closes T1/T8/T14: thrown Errors and async-function Promises used to leak the host Function constructor through their prototype chains. PoC: agent('x').constructor.constructor('return process')() try { throw } catch(e) { e.constructor.constructor('return process')() } Build every async/sync global (agent, parallel, pipeline, workflow, budget, console, phase, log, args) inside the vm-realm via the existing init script. Host only exposes a primitive bridge that the init script reads once and deletes from globalThis. Both rejection and resolution paths cross the boundary as vm-realm values. Closes T2: deepNullProto used to setPrototypeOf(null) on array args, breaking for-of / .map / .filter / spread / destructuring. Replaced with vm-realm JSON.parse of an args string — arrays retain vm-realm Array.prototype methods. Closes T13: runtime allowlist on agent() opts catches typos like 'scema'. Closes T9/T16/T17: stripExportMeta now recognises //, /* */, and regex literals; throws on unbalanced braces instead of silently returning ''. Closes T6: validateArgs rejects functions, BigInt, circular refs, and over-deep nesting (previously functions silently disappeared). Closes T5: regression test for console.log/warn/error → getLogs routing.
Production callers use Config.createToolRegistry's registerLazy path which dynamic-imports './tools/workflow/workflow.js'. The barrel export at index.ts previously forced eager evaluation of the workflow.js → workflow-orchestrator → workflow-sandbox → node:vm module chain for every consumer of @qwen-code/core, even when workflows are disabled. Sibling tool exports (AgentTool, SkillTool) are type-only; align WorkflowTool with the same pattern. SDK consumers can still annotate types; instantiation happens through the registry, not the barrel.
…ls + failure context + defensive serialization (PR #4732 R1) Closes T10: runReasoningLoop returns terminateMode = CANCELLED|MAX_TURNS| TIMEOUT|ERROR rather than throwing. Without checking it, await agent(...) resolved to '' on user cancel and the workflow kept looping. Now check getTerminateMode() after execute() and throw on non-GOAL — mirrors AgentTool's existing handling. Closes T11: workflow subagents previously ran with runConfig: {} (no max_turns / max_time_minutes guard) and tools: ['*'] without disallowedTools. A single agent() could loop the model indefinitely. Bound to 50 turns / 10 minutes; add disallowedTools: [SEND_MESSAGE, EXIT_PLAN_MODE] to mirror upstream Tg8 — defense in depth with the §XmO system prompt. Closes T19: phases / logs accumulated before a script failure used to be discarded with the sandbox instance. WorkflowExecutionError carries them through the rejection so the user-visible display can show what ran. Closes T12 / T18: defensive serialization. A successful workflow returning a BigInt or circular value used to be reported as 'Workflow failed: Converting circular structure to JSON' because JSON.stringify was inside the try block. safeStringifyResult / safeStringifyDisplayPayload degrade to a clear placeholder so a serialization issue doesn't masquerade as a run failure. Closes T4: regression test for ToolErrorType.EXECUTION_FAILED assertion. Closes T7: vi as vitest alias removed (now matches every other test file).
…on-env reference (PR #4732 R1) Closes T20: 6 of 9 new workflow files were missing the standard @license Apache-2.0 header. Add Qwen-style header (matches sibling tools/agent/agent.ts and others) to: workflow-sandbox.ts, workflow-sandbox.test.ts, workflow-orchestrator.ts, workflow-orchestrator.test.ts, workflow-prompts.ts, workflow.test.ts. Closes T21: config.workflows.test.ts and config.workflow-registration.test.ts both contained 'mirrors config-session-env.test.ts' in a setup comment, but that file does not exist in the repo. Drop the dangling reference.
Round 1 — all 21 findings addressedCommits (4): Full P1 test suite: 108 passed (was 80 — 28 new regression cases). Full core suite: 9960 passed / 4 skipped / 0 failed. typecheck: 0 errors. lint: clean on workflow files (only pre-existing unrelated errors in Critical (all 4 unique roots fixed; 7 inline threads closed)
Suggestions (all 12 fixed)
Nits (review-body)
Architecture noteThe unified vector for T1/T8/T14 is "any host object handed to the script keeps its host prototype chain." Round 1's fix moves the cross-realm boundary to a primitive-only host bridge: the init script reads a host object exposed at The same pattern also lets P2's |
SEC-C1: deep-null-proto + hardenClosure blocks args/closure realm-escape PoC. SEC-C2: vm 30s timeout kills sync infinite loops. UP-C1: agent() throws on unsupported opts (schema/isolation/model/agentType). UP-I1: keep verbatim subagent system prompt comment. ARCH-C1: thread AbortSignal into buildProductionDispatch → subagent.execute(). UP-C2: llmContent carries script result verbatim; metadata moves to returnDisplay. SEC-I1: add WORKFLOW to EXCLUDED_TOOLS_FOR_SUBAGENTS to prevent recursive fan-out. SEC-I2: cap logs[] at 10 000 lines with a truncation marker. REUSE-I1: use ToolErrorType.EXECUTION_FAILED in workflow error returns. TST: add security PoC tests, unique-runId, dispatch-rejection, llmContent-unwrap. TST-I1: remove setter/getter tautology test from config.workflows.test.ts.
- Extract WORKFLOW_SUBAGENT_SYSTEM_PROMPT into workflow-prompts.ts - Lift buildProductionDispatch() into exported createProductionDispatch(config, signal?) - WorkflowOrchestrator constructor now takes dispatch directly: (dispatch: WorkflowAgentDispatch) - Remove WorkflowOrchestratorOptions interface - WorkflowToolOptions.orchestratorOverrides replaced by WorkflowToolOptions.dispatch - WorkflowToolInvocation.execute() calls createProductionDispatch() when no override is set - Tests updated: orchestrator tests inject dispatch directly; production-dispatch tests moved to createProductionDispatch describe block
…hrow, phases cap, test fidelity (P1)
…ected closures (P1)
…on, consolidate WorkflowAgentResult (P1)
…ion + args threading
Second adversarial review surfaced 5 more findings; all fixed: 1. **convertAgentFiles passthrough corrupted nested objects.** Round-1 fixed serializeSubagent to skip emitting mcpServers/hooks (the local yaml-parser collapses nested values to '[object Object]'). The plugin-import passthrough I added to claude-converter.ts had the same mangle bug — a CC plugin agent with nested mcpServers got written to disk as 'mcpServers:\n - [object Object]'. Now skips both fields via a NESTED_FIELDS_NOT_ROUND_TRIPPABLE set. 2. **parseEffort accepted 0 and negative integers.** Docs say 'positive integer' and parseMaxTurns already rejects <= 0. parseEffort didn't, accepting '0' / '-5' / 0 / -1 as valid efforts. Aligned both helpers and added tests for the boundary values. 3. **Function name collision: permissionModeToApprovalMode.** packages/core/src/ tools/agent/agent.ts has a module-private permissionModeToApprovalMode that maps the qwen PermissionMode enum to ApprovalMode enum (different domain entirely). My new exported helper used the same name; IDE auto-import would silently return undefined for every qwen enum value. Renamed the export to claudePermissionModeToApprovalMode and updated 4 callers. 4. **color: 'auto' round-trip asymmetry.** Round-1 added a test pinning that the parser preserves the legacy 'auto' sentinel. But serializeSubagent already had a pre-existing 'skip emit when auto' branch, so a parse → serialize → parse cycle dropped the sentinel. The CLI helpers (shouldShowColor / getColorForDisplay) already treat 'auto' identically to undefined, so normalizing 'auto' to undefined at parse time has no downstream effect AND makes round-trip cleanly idempotent. 5. **Converter approvalMode precedence diverged from loader.** When a CC source file had both 'permissionMode: bypassPermissions' AND 'approvalMode: default', the convertClaudeAgentConfig path wrote 'approvalMode: yolo' (bridge wins). The loader's rule is 'approvalMode wins over bridge'. Extended ClaudeAgentConfig with an approvalMode field and gated the bridge emit on 'source approvalMode is unset', aligning convert and load precedence. Refs #4821 #4721 #4732
…l bridge (PR #4732) Post-R4 /simplify pass. Four review angles converged on one finding: the T40 manual AbortController-bridging at the call site re-implements `createChildAbortController` from `packages/core/src/utils/abortController.ts:61`, which is already the project-idiomatic helper for this exact pattern (used at agent-headless.ts:231, agent-interactive.ts:157, agent-core.ts:603 & 952). The replacement collapses 5 lines of imperative listener wiring at `workflow.ts:execute()` head + 1 line of finally cleanup into a single `createChildAbortController(signal)` call, plus inherits the helper's hardenings: - WeakRef on the parent (so a long-lived caller signal doesn't pin the child controller) - Auto-removal of the parent listener when the child fires (covers both the wall-clock-fire path and the natural-completion path) - Default 50-listener cap via `setMaxListeners` No behavior change at the API boundary — the wall-clock `abortOnTimeout` contract and the test assertions for T40's two cases (controller IS aborted on wall-clock; controller is NOT aborted on normal completion) all still hold. The /simplify "altitude" finding (push the bridging into the orchestrator) is deferred — that would change WorkflowOrchestrator's constructor/run signature and is outside this PR's scope. Also trims a redundant inline comment at workflow-orchestrator.ts:172 (the `abortOnTimeout: req.abortOnTimeout` line; the field's type comment at line 71-81 already explains the contract). 114/114 workflow suite tests pass, typecheck silent.
…Controller refactor (PR #4732) The previous commit moved the bridging logic into the helper, so the inline comments restating the helper's contract (parent forwarding, already-aborted fast path, WeakRef, auto-removal) became redundant — that's the helper's job to document. Compress to the load-bearing semantics: the child controller sees both caller-driven and wall-clock-driven aborts, and `finally` cancels stragglers on normal completion. No code change.
…contract Three quality findings from the round-3 adversarial pass (after rounds 1+2 fixed 9 issues each). All low-severity but cheap; the runtime was correct, the file-on-disk + public-API surface needed tightening. 1. **Stale runConfig.max_turns duplicated on every write.** Top-level maxTurns was promoted in 9b903ee but the serializer still emitted runConfig.max_turns verbatim. A file with both fields kept both indefinitely. The runtime already prefers top-level (so behavior is correct), but a third-party tool or future viewer reading the legacy nested field would act on stale data. Prune nested max_turns from the serialized runConfig when top-level maxTurns is set; drop the runConfig block entirely if pruning leaves it empty. Three new tests pin the prune, the drop-empty-block path, and the no-top-level fallback. 2. **14 internal symbols leaked into the public package API.** The PR re-exported every helper from agent-frontmatter-schema.ts via subagents/index.ts. grep against packages/cli + packages/vscode-ide-companion finds zero cross-package consumers — both internal users (subagent-manager, claude-converter) already import directly from the schema file. Locking constant names like EFFORT_VALUES and parseEffort in the public API would constrain follow-up PRs (e.g. when js-yaml lands and the schema shape changes). Removed the entire export block; left a comment explaining the choice so a future PR knows the bar for re-introducing it. 3. **convertClaudeAgentConfig silently dropped a standalone approvalMode.** Round-2 gated the permissionMode bridge on !claudeAgent.approvalMode for precedence parity with the loader, but never added the explicit copy for approvalMode-only callers. Result: caller passes {approvalMode: 'plan'} → output has no approvalMode at all. The in-tree convertAgentFiles path recovered via the unowned-key passthrough, but exported direct-call surface needs the explicit copy. Two new tests pin both the standalone case and the precedence-when-both case. Also: pinned yaml-parser limitation tests (added in the round-3 e2e independent check) document that the lightweight parser cannot round-trip nested mcpServers/hooks; documentation + types comments updated to call out the carve-out so users know the field works only for flat forms until js-yaml lands. Refs #4821 #4721 #4732
|
@wenshao friendly ping for re-review when you have a moment — all 42 review threads across R1–R5 are now resolved, CI 8/8 green at HEAD |
Reduces this PR to ship only the fields that have an end-to-end runtime path today: permissionMode (bridges to existing approvalMode), maxTurns (wires into runConfig.max_turns), and a tightened color allowlist. Everything else gets deferred to follow-up PRs that first land the prerequisite infra they need. Removed: - effort (no model-layer effort param in qwen providers) - mcpServers / hooks (need nested-aware YAML parser — yaml-parser.ts is hand-rolled and only handles 1 level) - memory (qwen's auto-memory has no user/project/local scope distinction) - isolation (workflow PR #4732 owns the runtime) - initialPrompt (needs --agent CLI flag, no main-session-agent infra) - skills (needs SkillManager consumption of config.skills) This drops 7 fields from SubagentConfig + types.ts, deletes their parser / serializer / test code, deletes the unused enum constants and helpers from agent-frontmatter-schema.ts (EFFORT_VALUES, EFFORT_ALIASES, MEMORY_VALUES, ISOLATION_VALUES, parseEffort, isMemory, isIsolation), and trims the user docs to the supported field set with a pointer to the design doc for the deferred fields. Why ship the design doc alongside such a narrow v1: the full reverse-engineering record in docs/declarative-agents-port.md (DL7/Ig5/GN/_Y constants, error messages, schema parity decisions, coordination matrix with workflow PR #4732) stays load-bearing for the follow-up PRs. A status table at the top discloses what's deferred and why. Net: -559 LOC across 7 files. Final PR is permissionMode bridge + maxTurns wiring + color allowlist + design reference doc + the yaml-parser nested-limitation pin tests added during round-3 review. Refs #4821 #4721 #4732
✅ Local runtime verification — P1 Workflow toolI verified this PR end-to-end at runtime (not just the unit suite) on the true post-merge state, as a merge reference. Setup — tested the merged tree, not the branch in isolationThe branch is 38 commits behind current 1. Unit suites — 118 / 118 pass2. Typecheck — all 13 changed files clean
3. Real runtime exercise (
|
| Area | Verified |
|---|---|
Sequential agent() |
A → B call order; B's prompt receives A's output (chaining); args / phase / log captured in order; runId = wf_<hex>; script return value passed through verbatim |
| Determinism | Date.now(), new Date(), Math.random() all throw (resume-safety guarantee) |
| P1 gating | parallel(), pipeline(), workflow(), agent({schema|model|isolation|agentType}), budget.spent() all throw with clear "scheduled for P2/P3/P5" messages; an unknown agent() opt (typo scema) is rejected by the runtime allowlist |
| Sandbox isolation | globalThis.constructor.constructor('return process')(), thrown-Error realm escape, and agent-result realm escape are all blocked (process is not defined); process / require are undefined inside the sandbox |
export const meta |
leading meta is stripped and the body runs; an unbalanced-brace meta is refused (throws, no silent truncation) |
| Robustness | non-JSON args (a function) is rejected; a script failure preserves phases / logs and surfaces a clean message (no realm-leaked Error: prefix); the async wall-clock timeout fires at 800 ms and aborts the linked abortOnTimeout controller |
4. Gating (off by default) — confirmed by the config suites
Not registered unless isWorkflowsEnabled(); QWEN_CODE_ENABLE_WORKFLOWS=1 enables it; QWEN_CODE_DISABLE_WORKFLOWS=1 is a kill-switch that overrides; ToolNames.WORKFLOW is in EXCLUDED_TOOLS_FOR_SUBAGENTS, so a subagent can't recursively invoke the workflow.
Verdict
✅ Functionally verified at runtime. The node:vm sandbox, sequential dispatch, determinism guarantees, P1 forward-compat stubs, and the host-realm isolation boundary all behave as designed on the merged tree. No functional blockers found; safe to merge from a runtime-verification standpoint. (Scope: behavior/runtime verification — not a full line-by-line security audit of the sandbox.)
DragonnZhang
left a comment
There was a problem hiding this comment.
Code Review: PR #4732 — feat(core): Workflow tool P1
Deterministic checks
| Check | Result |
|---|---|
TypeScript (tsc --noEmit) |
PASS (clean) |
| ESLint (8 changed source files) | PASS (clean) |
| Tests (5 workflow test files, 118 tests) | PASS (118/118) |
Overall assessment
This is a well-engineered PR with thorough security hardening, comprehensive test coverage, and clean three-layer architecture (WorkflowTool / WorkflowOrchestrator / createWorkflowSandbox). The six rounds of iterative security fixes addressing node:vm prototype-escape vectors are impressive — the PoC regression tests in workflow-sandbox.test.ts are some of the best sandbox-hardening test coverage I have seen in a PR of this scope.
No blocking issues found. Five non-blocking observations below.
Findings (5 non-blocking)
F1. [Security - Suggestion] No script size cap in validateToolParamValues
validateToolParamValues checks only that script is a non-empty string but imposes no maximum length. While mitigated by the ask permission gate and model output token limits, a defense-in-depth cap (e.g., 256KB) would prevent a hostile or hallucinating model from submitting a multi-megabyte script that pressures the vm parser or the stripExportMeta brace walker.
// packages/core/src/tools/workflow/workflow.ts — validateToolParamValues
const MAX_SCRIPT_LENGTH = 256_000; // ~256KB
if (params.script.length > MAX_SCRIPT_LENGTH) {
return `WorkflowTool: script exceeds max length of ${MAX_SCRIPT_LENGTH} characters.`;
}Not blocking — the opt-in feature gate and ask permission provide meaningful mitigation already.
F2. [Security - Informational] Async microtask leak after wall-clock timeout
After Promise.race rejects on wall-clock timeout, an in-script async microtask loop (e.g., (async () => { while(true) await 0 })()) continues consuming microtasks on the host event loop. node:vm provides no mechanism to halt async execution once started. The PR documents this limitation and correctly identifies worker_threads isolation as the proper fix for a future phase.
Acceptable for P1 given the opt-in gate and the 30-minute default cap. Should be tracked as a known limitation for the issue (#4721).
F3. [Security - Informational] No memory cap for vm context
node:vm does not enforce a memory limit. A script like const a=[]; while(true) a.push(new ArrayBuffer(1e8)) can OOM the host process. The PR documents this and suggests --max-old-space-size as an operator mitigation. Same disposition as F2 — acceptable for P1, tracked for future worker_threads migration.
F4. [Correctness - Nit] Copyright header inconsistency
config.workflows.test.ts and config.workflow-registration.test.ts carry Copyright 2025 Google LLC while the other six new workflow source/test files carry Copyright 2025 Qwen. Cosmetic only — worth aligning if there is a project convention on copyright holder.
F5. [Performance - Informational] Per-call dynamic import in dispatch closure
createProductionDispatch does await import('./agent-headless.js') inside every dispatch invocation. Node caches module resolution after the first import, so subsequent calls resolve in a single microtask tick. Negligible overhead in practice. No action needed — the dynamic import pattern is load-bearing for test mockability.
Reverse audit (things NOT changed that might need attention)
EXCLUDED_TOOLS_FOR_SUBAGENTSinagent-core.ts: Correctly updated withToolNames.WORKFLOW. Anti-recursion test inconfig.workflow-registration.test.tslocks this in.tool-names.ts: BothToolNames.WORKFLOWandToolDisplayNames.WORKFLOWadded. No migration mapping needed (new tool).index.tsbarrel export: Changed toexport typeonly — avoids eager loading of thenode:vmmodule chain. Correct.config.ts: Feature gate follows the establishedisForkSubagentEnabledpattern (env var + settings + kill switch). Registration usesregisterLazy. Correct.
Summary
| Axis | Verdict |
|---|---|
| Correctness | PASS — architecture is sound, cross-realm Error handling is correct, AbortController chain (caller -> child -> dispatch -> subagent) is well-designed |
| Security | PASS with notes — six rounds of prototype-escape hardening with PoC regression tests; node:vm limitations are honestly documented; opt-in gate is the primary defense |
| Performance | PASS — no hot-path concerns; log/phase caps bound memory; wall-clock timer is unref'd |
| Test coverage | STRONG — 118 tests across 5 files covering security PoCs, edge cases, error paths, and injection seams |
LGTM with optional improvements (F1 script size cap being the most actionable).
…#4732 R7 F4) Both files were derived from a template (the stale `config-session-env.test.ts` reference cleaned up in R1 T21) and retained the upstream `Copyright 2025 Google LLC` header. The other six new workflow source/test files in this PR carry `Copyright 2025 Qwen`. Align for same-PR consistency. Per DragonnZhang R7 F4. No behavior change; header text only.
R6 + R7 summary — HEAD
|
| # | File | Disposition |
|---|---|---|
| T45 | sandbox.ts:478 (Object.keys allowlist) | ❌ out of scope — explicit throws cover P1 surface (you noted this); preemptive hardening belongs in P2/P3 PR |
| T46 | sandbox.ts:36 (stripExportMeta leading comments) | ❌ out of scope — reverses my R4 T33 design decision (clean SyntaxError as constraint signal) |
| T47 | orchestrator.ts:46 (stale cause JSDoc) |
❌ out of scope — R6+ inline-doc polish defaults to overthinking; folded into future doc-pass |
| T48 | workflow.test.ts:57 (external-abort layer test) | ❌ out of scope — R6+ test-coverage additions default to overthinking; helper-contract tests already in utils/abortController.test.ts |
@DragonnZhang — LGTM with optional improvements
Thanks for the independent code review. The "LGTM with optional improvements" plus the deterministic checks (tsc clean / eslint clean / 118/118) is great signal alongside @wenshao's runtime verification.
Disposition of F1-F5:
| # | Disposition |
|---|---|
| F1 (script size cap) | ❌ out of scope — hostile-provider hardening; you noted opt-in gate + ask permission are the meaningful mitigation already |
| F2 (async microtask leak) | ✅ documented as known P1 limitation in #4721 (#4721) — worker_threads migration tracked for future phase |
| F3 (vm memory cap) | ✅ documented as known P1 limitation in #4721 alongside F2 |
| F4 (copyright header inconsistency) | ✅ fixed in a008fd648 — both config.workflows.test.ts and config.workflow-registration.test.ts now Copyright 2025 Qwen to match the other 6 PR-new files |
| F5 (per-call dynamic import) | ✅ acknowledged — your assessment ("No action needed — dynamic import is load-bearing for test mockability") is correct |
Status: HEAD a008fd648, CI 8/8 green, all 46 review threads across R1-R7 resolved, known P1 limitations documented in parent #4721. Pending: re-review or CHANGES_REQUESTED dismissal to unblock merge.
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
| `Workflow subagent did not complete (terminate mode: ${mode}).`, | ||
| ); | ||
| } | ||
| return subagent.getFinalText(); |
There was a problem hiding this comment.
[Suggestion] subagent.getFinalText() can return '' (empty string) on GOAL mode — the dispatch silently returns "" with no warning.
Looking at agent-core.ts, finalText starts as '' and is only updated when a round produces text output. If the model returns GOAL with no text (tool-only execution, content-filtered response, empty output), the script receives "" from await agent("..."). A subsequent JSON.parse(agentResult) or string operation would throw a confusing error with no indication that the root cause was an empty agent response.
| return subagent.getFinalText(); | |
| const text = subagent.getFinalText(); | |
| if (!text) { | |
| throw new Error( | |
| `Workflow subagent '${opts?.label ?? 'workflow-agent'}' completed with GOAL mode but returned empty text. This usually indicates a model-side issue (content filter, empty response, or tool-only execution).`, | |
| ); | |
| } | |
| return text; |
— qwen3.7-max via Qwen Code /review
| protected override validateToolParamValues( | ||
| params: WorkflowParams, | ||
| ): string | null { | ||
| if (typeof params.script !== 'string' || params.script.length === 0) { |
There was a problem hiding this comment.
[Suggestion] No upper bound on script size. The only check is script.length === 0 — a multi-megabyte model-generated script is accepted and passed to new vm.Script(), which compiles synchronously and is NOT covered by the 30s vm timeout (that only applies to runInContext).
A model hallucination loop could produce a 5MB+ script that blocks the event loop during V8 compilation with no timeout and no abort signal.
| if (typeof params.script !== 'string' || params.script.length === 0) { | |
| if (typeof params.script !== 'string' || params.script.length === 0) { | |
| return 'WorkflowTool: `script` parameter is required and must be a non-empty string.'; | |
| } | |
| const MAX_SCRIPT_LENGTH = 256_000; | |
| if (params.script.length > MAX_SCRIPT_LENGTH) { | |
| return `WorkflowTool: script is ${params.script.length} chars, exceeds limit of ${MAX_SCRIPT_LENGTH}. This usually indicates a model generation loop.`; | |
| } |
— qwen3.7-max via Qwen Code /review
| return /[{[(,;:=!&|?+\-*/%^~<>]/.test(prev); | ||
| } | ||
|
|
||
| import * as vm from 'node:vm'; |
There was a problem hiding this comment.
[Suggestion] import * as vm from 'node:vm' appears after ~120 lines of code (two exported functions: stripExportMeta and isRegexContext). Every other file in the codebase — including the two sibling files added in this PR — clusters all imports at the top, immediately after the license header.
ESM hoists imports so this works at runtime, but it is visually inconsistent. Move to line 7, directly below the license header.
— qwen3.7-max via Qwen Code /review
| return this.workflowsEnabled; | ||
| } | ||
|
|
||
| setWorkflowsEnabled(enabled: boolean): void { |
There was a problem hiding this comment.
[Suggestion] workflowsEnabled is the only mutable (non-readonly) feature flag on Config. Every other *Enabled field (cronEnabled, forkSubagentEnabled, computerUseEnabled, experimentalZedIntegration) is private readonly. The setWorkflowsEnabled setter is the only public setter for any feature flag on the entire Config class.
This breaks Config's post-construction immutability contract. If a caller mutates this flag mid-session, isWorkflowsEnabled() could return a different value than what the tool registry was built with (constructed once at startup via registerLazy).
Make workflowsEnabled readonly like its siblings. Tests that need workflowsEnabled: true can pass it via ConfigParameters (as the existing "honors workflowsEnabled passed via ConfigParameters" test already demonstrates).
— qwen3.7-max via Qwen Code /review
| signal?: AbortSignal, | ||
| ): WorkflowAgentDispatch { | ||
| return async (prompt, opts) => { | ||
| const { AgentHeadless, ContextState } = await import('./agent-headless.js'); |
There was a problem hiding this comment.
[Suggestion] await import('./agent-headless.js') is inside the returned closure, so every agent() call re-executes the dynamic import(). Node's module cache prevents re-loading from disk, but each call still goes through the async module-resolution pipeline. Every other consumer (subagent-manager.ts, background-agent-resume.ts) uses a static import.
Hoist to the factory level so it resolves once per createProductionDispatch call:
| const { AgentHeadless, ContextState } = await import('./agent-headless.js'); | |
| export function createProductionDispatch( | |
| config: Config, | |
| signal?: AbortSignal, | |
| ): WorkflowAgentDispatch { | |
| const modulePromise = import('./agent-headless.js'); | |
| return async (prompt, opts) => { | |
| const { AgentHeadless, ContextState } = await modulePromise; |
This preserves the test-mock property (dynamic import lets mocks swap the module) while eliminating repeated resolution.
— qwen3.7-max via Qwen Code /review
| const llmText = (result.llmContent as Array<{ text: string }>)[0]!.text; | ||
| expect(llmText).toBe('(workflow returned no value)'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] Missing test for dispatchController.abort() in the finally block at workflow.ts:174. Every test passes new AbortController().signal but never inspects whether the internal child controller created via createChildAbortController is actually aborted after execution.
A regression that removes the finally { dispatchController.abort() } block would pass all existing tests. In production this leaks: every successful workflow would leave the dispatch controller un-aborted, meaning any straggler subagent keeps burning tokens until its own max_time_minutes limit.
Add a test that captures the dispatch signal and asserts signal.aborted === true after execute() resolves on both success and failure paths.
— qwen3.7-max via Qwen Code /review
Verification Report — local runtime build & driveVerdict: PASS ✅ (one documentation/wiring note below) Built the branch from a clean checkout and drove the real bundled CLI in Environment
What was verified (all at the live CLI surface)1. Gating — off by default 2. Gating — opt-in via env var 3. Kill switch 4. Happy path — single sequential agent { "runId": "wf_31ebe7f9247fa960", "phases": ["Greet","Wrap"],
"logs": ["starting","agent replied: PONG"],
"result": { "agentSaid": "PONG", "ok": true } }Real subagent dispatched and its value flowed back into the script. The LLM received the unwrapped 5. Headline feature — two agents in order { "runId": "wf_3663a65961a670d4", "phases": ["A","B"],
"logs": ["got 1 then 2"], "result": { "first": "1", "second": "2" } }Two subagents executed sequentially, phases and log captured in order. ✅ Probes (off the happy path)
Findings
Out of scope (per PR, not exercised)
Verified on Linux via a tmux-driven bundled CLI against the real model backend. Evidence = the tool-result payloads captured above, reproduced verbatim from the running app. |
tanzhenxin
left a comment
There was a problem hiding this comment.
Second approval based on the converged review history: the multi-round sandbox-escape findings were all addressed, wenshao's runtime verification passed (118/118 unit, 33/33 runtime, isolation PoCs), and the feature is off by default so the blast radius is contained.
…+ maxTurns wiring + color allowlist (CC 2.1.168 parity) (#4842) * docs: add declarative agents port design doc for #4821 * feat(core): add declarative agent frontmatter schema constants and 9 new fields Adds agent-frontmatter-schema.ts as the single source of truth for the CC 2.1.168 declarative-agent enum constants (EFFORT_VALUES, PERMISSION_MODE_VALUES, MEMORY_VALUES, ISOLATION_VALUES, COLOR_VALUES) and lenient parsers (parseEffort, parseMaxTurns, parseBackground, parseStringOrArray) that mirror DL7's warn-and-drop posture. Also adds a permissionModeToApprovalMode bridge to map CC's permission modes onto qwen-code's existing approvalMode semantics. Extends SubagentConfig with 9 optional fields carried verbatim from CC frontmatter: permissionMode, effort, maxTurns, skills, initialPrompt, memory, isolation, mcpServers, hooks. Runtime semantics for the metadata-only fields are deferred to follow-up PRs; this lands the data carrier. Refs #4821 #4721 #4732 * feat(core): parse and serialize 9 new declarative agent frontmatter fields Extends parseSubagentContent and serializeSubagent to round-trip the 9 CC 2.1.168 fields previously added to the SubagentConfig type. Parsing follows DL7's lenient warn-and-drop posture (vs the existing strict throw posture used for approvalMode), so a Claude Code agent file with an invalid optional field still parses with that field dropped — matching CC behavior so users can drop CC agent files into .qwen/agents/ unchanged. Adds a permissionMode → approvalMode bridge: when frontmatter has permissionMode but no approvalMode, the bridge resolves the approvalMode at parse time using the same mapping as claude-converter.ts. If both are set, approvalMode wins (already-explicit values take precedence over inferred ones). Tests: - 22 new parser cases covering happy paths, invalid drops, permissionMode bridge precedence, color allowlist (with auto sentinel preserved), and loose-validation passthrough for mcpServers / hooks. - 9 new serializer cases asserting round-trip and omit-when-unset behavior. Refs #4821 #4721 #4732 * feat(core): promote top-level maxTurns to runConfig.max_turns at convert time Wires the new top-level `maxTurns` field into the existing runtime configuration pipeline so the value declared in agent frontmatter actually limits the agent's turn budget at run time. When both the top-level `maxTurns` and the legacy nested `runConfig.max_turns` are set, the top-level field wins (more specific and matches the CC frontmatter shape upstream agent files use). When only the nested field is set, behavior is unchanged. Refs #4821 #4721 #4732 * refactor(core): use shared permissionMode bridge and parseStringOrArray in claude-converter Replaces the inline claudeToQwenMode table and the local parseStringOrArray helper in claude-converter.ts with the shared exports from agent-frontmatter-schema.ts. One source of truth for the CC ↔ qwen mapping keeps the two import paths (.qwen/agents/*.md frontmatter and Claude plugin import) in sync — when the upstream CC schema changes, only one table needs updating. Adds explicit tests for the bridge mapping (six known modes + unknown fallback). The fallback now happens at the call site rather than inside the table, but the observable behavior is unchanged. Refs #4821 * docs(subagents): document CC-compatible frontmatter fields Adds a Claude Code Compatibility Fields section to the user-facing subagents reference, covering the 9 new frontmatter fields landed in this PR. The intent is for users with existing Claude Code agents to know they can drop the files into `.qwen/agents/` and have them parse identically. Refs #4821 * refactor(core): reuse parseBackground in declarative agent loader Replaces the inline boolean-or-string lenient parse for the background field with the shared parseBackground util added in the schema module. Same observable behavior; one fewer place to keep in sync when the upstream shape changes. Refs #4821 * fix(core): address adversarial review of declarative agents PR Four self-review findings caught and fixed before opening the PR: 1. **mcpServers/hooks round-trip claim was broken.** The local yaml-parser only formats one level of nesting, so serializing an agent with nested mcpServers or hooks would mangle the value into '[object Object]'. The fields are still carried verbatim in memory (read path is fine), but the serializer no longer emits them — losing the field through '/agents' edits is strictly safer than corrupting it. Documented the limitation in docs/users/features/sub-agents.md. 2. **approvalMode throw vs permissionMode lenience asymmetry.** The pre-existing approvalMode parser threw on invalid values, killing the entire agent file. With the new permissionMode bridge, a CC-imported file with 'permissionMode: bypassPermissions' + 'approvalMode: tpyo' would reject the file instead of dropping the typo and using the bridge. Demoted approvalMode to the same DL7-parity warn-and-drop posture all the new fields use; the bridge now runs when approvalMode is invalid. 3. **parseEffort missed numeric strings.** CC's DL7 falls back to parseInt for non-enum effort strings so 'effort: "5"' (quoted YAML) round-trips like 'effort: 5'. Added the same fallback and tests for floats / partial numeric strings / valid numeric strings. 4. **convertAgentFiles stripped 6/9 of the new fields.** The Claude plugin import path read frontmatter into a fixed ClaudeAgentConfig shape, so 'effort', 'maxTurns', 'initialPrompt', 'memory', 'isolation', 'mcpServers', and 'background' were silently dropped when installing a CC plugin agent. Build the rewritten frontmatter from the original keys first, then overlay the converter's transformations for the keys it owns — unknown CC fields now passthrough verbatim, future-proofing against later CC additions. Refs #4821 #4721 #4732 * fix(core): round-2 self-review fixes for declarative agents PR Second adversarial review surfaced 5 more findings; all fixed: 1. **convertAgentFiles passthrough corrupted nested objects.** Round-1 fixed serializeSubagent to skip emitting mcpServers/hooks (the local yaml-parser collapses nested values to '[object Object]'). The plugin-import passthrough I added to claude-converter.ts had the same mangle bug — a CC plugin agent with nested mcpServers got written to disk as 'mcpServers:\n - [object Object]'. Now skips both fields via a NESTED_FIELDS_NOT_ROUND_TRIPPABLE set. 2. **parseEffort accepted 0 and negative integers.** Docs say 'positive integer' and parseMaxTurns already rejects <= 0. parseEffort didn't, accepting '0' / '-5' / 0 / -1 as valid efforts. Aligned both helpers and added tests for the boundary values. 3. **Function name collision: permissionModeToApprovalMode.** packages/core/src/ tools/agent/agent.ts has a module-private permissionModeToApprovalMode that maps the qwen PermissionMode enum to ApprovalMode enum (different domain entirely). My new exported helper used the same name; IDE auto-import would silently return undefined for every qwen enum value. Renamed the export to claudePermissionModeToApprovalMode and updated 4 callers. 4. **color: 'auto' round-trip asymmetry.** Round-1 added a test pinning that the parser preserves the legacy 'auto' sentinel. But serializeSubagent already had a pre-existing 'skip emit when auto' branch, so a parse → serialize → parse cycle dropped the sentinel. The CLI helpers (shouldShowColor / getColorForDisplay) already treat 'auto' identically to undefined, so normalizing 'auto' to undefined at parse time has no downstream effect AND makes round-trip cleanly idempotent. 5. **Converter approvalMode precedence diverged from loader.** When a CC source file had both 'permissionMode: bypassPermissions' AND 'approvalMode: default', the convertClaudeAgentConfig path wrote 'approvalMode: yolo' (bridge wins). The loader's rule is 'approvalMode wins over bridge'. Extended ClaudeAgentConfig with an approvalMode field and gated the bridge emit on 'source approvalMode is unset', aligning convert and load precedence. Refs #4821 #4721 #4732 * fix(core): round-3 self-review fixes — drift, API hygiene, converter contract Three quality findings from the round-3 adversarial pass (after rounds 1+2 fixed 9 issues each). All low-severity but cheap; the runtime was correct, the file-on-disk + public-API surface needed tightening. 1. **Stale runConfig.max_turns duplicated on every write.** Top-level maxTurns was promoted in 9b903ee but the serializer still emitted runConfig.max_turns verbatim. A file with both fields kept both indefinitely. The runtime already prefers top-level (so behavior is correct), but a third-party tool or future viewer reading the legacy nested field would act on stale data. Prune nested max_turns from the serialized runConfig when top-level maxTurns is set; drop the runConfig block entirely if pruning leaves it empty. Three new tests pin the prune, the drop-empty-block path, and the no-top-level fallback. 2. **14 internal symbols leaked into the public package API.** The PR re-exported every helper from agent-frontmatter-schema.ts via subagents/index.ts. grep against packages/cli + packages/vscode-ide-companion finds zero cross-package consumers — both internal users (subagent-manager, claude-converter) already import directly from the schema file. Locking constant names like EFFORT_VALUES and parseEffort in the public API would constrain follow-up PRs (e.g. when js-yaml lands and the schema shape changes). Removed the entire export block; left a comment explaining the choice so a future PR knows the bar for re-introducing it. 3. **convertClaudeAgentConfig silently dropped a standalone approvalMode.** Round-2 gated the permissionMode bridge on !claudeAgent.approvalMode for precedence parity with the loader, but never added the explicit copy for approvalMode-only callers. Result: caller passes {approvalMode: 'plan'} → output has no approvalMode at all. The in-tree convertAgentFiles path recovered via the unowned-key passthrough, but exported direct-call surface needs the explicit copy. Two new tests pin both the standalone case and the precedence-when-both case. Also: pinned yaml-parser limitation tests (added in the round-3 e2e independent check) document that the lightweight parser cannot round-trip nested mcpServers/hooks; documentation + types comments updated to call out the carve-out so users know the field works only for flat forms until js-yaml lands. Refs #4821 #4721 #4732 * refactor(core): shrink declarative agents PR to vertically-sliced v1 Reduces this PR to ship only the fields that have an end-to-end runtime path today: permissionMode (bridges to existing approvalMode), maxTurns (wires into runConfig.max_turns), and a tightened color allowlist. Everything else gets deferred to follow-up PRs that first land the prerequisite infra they need. Removed: - effort (no model-layer effort param in qwen providers) - mcpServers / hooks (need nested-aware YAML parser — yaml-parser.ts is hand-rolled and only handles 1 level) - memory (qwen's auto-memory has no user/project/local scope distinction) - isolation (workflow PR #4732 owns the runtime) - initialPrompt (needs --agent CLI flag, no main-session-agent infra) - skills (needs SkillManager consumption of config.skills) This drops 7 fields from SubagentConfig + types.ts, deletes their parser / serializer / test code, deletes the unused enum constants and helpers from agent-frontmatter-schema.ts (EFFORT_VALUES, EFFORT_ALIASES, MEMORY_VALUES, ISOLATION_VALUES, parseEffort, isMemory, isIsolation), and trims the user docs to the supported field set with a pointer to the design doc for the deferred fields. Why ship the design doc alongside such a narrow v1: the full reverse-engineering record in docs/declarative-agents-port.md (DL7/Ig5/GN/_Y constants, error messages, schema parity decisions, coordination matrix with workflow PR #4732) stays load-bearing for the follow-up PRs. A status table at the top discloses what's deferred and why. Net: -559 LOC across 7 files. Final PR is permissionMode bridge + maxTurns wiring + color allowlist + design reference doc + the yaml-parser nested-limitation pin tests added during round-3 review. Refs #4821 #4721 #4732 * refactor(core): revert round-X residuals to hit v1's actual minimum LOC The previous shrink left round-1/2/3 fixes in place that were originally for the wider 9-field scope; with the carry-only fields gone they're no longer needed. Reverting them all back to pre-PR behaviour: - approvalMode strict throw (round-1 demoted to lenient; defer the symmetry fix to a separate PR if wanted) - runConfig.max_turns prune on serialize (round-3 cleanup; defer) - color: 'auto' normalised to undefined (round-2 round-trip fix; defer) - claude-converter NESTED_FIELDS_NOT_ROUND_TRIPPABLE + CONVERTER_OWNED_KEYS + passthrough loop (round-1/2 — all for the deleted carry-only fields) - ClaudeAgentConfig.approvalMode + precedence gating (round-3) - parseBackground in shared schema module (the dedup wasn't pulling weight) - parseStringOrArray in shared schema module (same) The shared bridge claudePermissionModeToApprovalMode + the disambiguated name stay — converter and loader use the same map and the name collision risk is real. Net: -336 LOC vs the prior shrink commit. * docs: sync user-facing docs to actual v1 behaviour The previous shrink left two stale claims in docs/users/features/sub-agents.md that referenced behaviour reverted in the round-X-revert commit: - maxTurns row claimed 'the legacy nested value is pruned from the on-disk file on save to avoid two sources of truth' — the prune was reverted - color row claimed 'auto is accepted on read and normalised to undefined for round-trip parity' — the normalisation was reverted; auto is preserved as-is for backward compat Updated both rows to match what the parser actually does now. The design doc (docs/declarative-agents-port.md) is unchanged: its top-line disclaimer already labels the body as reference material for follow-up PRs, so the references to the wider plan (D7 bridge, P1-P4 phases) are accurate-as-reference even though some details are deferred. * fix(core): use Map.get in claudePermissionModeToApprovalMode (prototype-chain footgun) The PERMISSION_MODE_TO_APPROVAL_MODE table was a plain object accessed with `[key]`. Calling claudePermissionModeToApprovalMode('__proto__') walked the prototype chain and returned Object.prototype — a non-string value that violated the declared return type. The current loader path was safe because isPermissionMode() filtered prototype-key strings before the bridge ran, but the exported function itself was a latent footgun for any future caller that didn't know to pre-filter. Switched the lookup table to a Map so prototype keys cannot be reached. Added explicit tests for __proto__, constructor, hasOwnProperty, toString. Refs #4821 * fix(core): address round-2 review on declarative agents PR #4842 Two findings: 1. `serializeSubagent` wrote both `permissionMode` and the bridge-derived `approvalMode` into the same frontmatter block. On the next load the parser takes `approvalMode` (explicit wins over bridge), so `permissionMode` silently became dead frontmatter — a user editing it later in the file would be ignored. Skip the `permissionMode` emit when `approvalMode` is also being written; `permissionMode` still round-trips when it's the user's only intent. Two new tests pin both branches. 2. `subagents/index.ts` had a NOTE comment referencing `EFFORT_VALUES` and `parseEffort` as example schema helpers that intentionally aren't re-exported. Those symbols were removed during the v1 scope shrink (637b8b7) and don't exist in the current module. Updated the comment to reference `claudePermissionModeToApprovalMode`, `parseMaxTurns`, and `isPermissionMode`, which are the actual current contents. Sibling-drift note: the same dual-emit pattern exists for `maxTurns` vs `runConfig.max_turns` — round-X revert (70a876d) explicitly deferred the `runConfig.max_turns` prune. Not addressed here per that earlier decision. Refs #4842
What this PR does
Implements P1 of the Dynamic Workflows / Ultracode port — a minimal
Workflowtool that runs a model-authored JavaScript script in anode:vmsandbox withargs,phase,log, and sequentialagent()globals. The tool is registered behindisWorkflowsEnabled()(off by default; opt-in via setting orQWEN_CODE_ENABLE_WORKFLOWS=1).Architecture is three layers with clean injection seams:
WorkflowTool(packages/core/src/tools/workflow/workflow.ts) —BaseDeclarativeToolsubclass that owns the tool registration, params validation, and error marshallingWorkflowOrchestrator+createProductionDispatch(config, signal?)(packages/core/src/agents/runtime/workflow-orchestrator.ts) — owns run lifecycle, dispatch construction, and wrapsAgentHeadlessfor the production code pathcreateWorkflowSandbox(opts)(packages/core/src/agents/runtime/workflow-sandbox.ts) — owns thenode:vmcontext, hardened globals, and the determinism stubs (Date.now/Math.randomthrow)P2/P3/P5 forward-compatibility seams are already in
SandboxOptions(parallel?,pipeline?,budget?) so future phases can wire real implementations without re-opening the sandbox's already-reviewed security surface.The subagent system prompt is verbatim from the claude-code 2.1.160 binary §XmO constant — five bullets including the "return ONLY the raw JSON" instruction critical for P3 schema mode.
Why it's needed
See #4721 for the full motivation and phased plan. TL;DR: this adds a third tier above the existing
/swarmtool (#3433) and Agent Team (#2886, in progress) — model-authored JS that orchestrates many subagents through a small set of primitives, matching upstream'sAgent+Workflowcoexistence.P1 is the minimum shippable skeleton:
Workflowtool is registered and gated; the JS sandbox executes scriptsagent()only —parallel/pipeline/workflow/budgetare throwing stubs that surface a clear "scheduled for PN" error (notReferenceError)Date.now()/Math.random()/new Date()throw to guarantee resume determinismLocal design artifacts (not committed; they live in
.qwen/design/per repo convention) document the full upstream alignment study:.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160.md(788 lines) — per-axis upstream findings, qwen-code fit matrix, gap analysis.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160-liveprobe-delta.md— API surface confirmation against the real/deep-researchworkflow script captured from~/.claude/projects/<session>/.qwen/design/p1-implementation-plan.md— the task-by-task plan this branch executesReviewer Test Plan
How to verify
Notable test categories under
workflow-sandbox.test.ts:stripExportMeta— 8 cases covering brace balancing, string-literal escapescreateWorkflowSandbox— args verbatim, Date.now/Math.random throw, top-level return capturecreateWorkflowSandbox security— 18 cases including realm-escape PoCs that have been confirmed-blocked against hostprocess(coversargs.constructor.constructor,Math.__proto__,Math.toString.constructor,Math.abs.constructor,Date.constructor, array-args[].constructor,budget.spent.constructor, the 30s vm timeout, log/phases caps, agent({schema/isolation/model/agentType}) throws, and the injection seams forparallel/pipeline/budget)Evidence (Before & After)
N/A — no user-visible UI change in P1. The tool is registered but the model only emits a
Workflowcall when explicitly prompted; no Footer//workflowsUI lands until P4.Tested on
Environment (optional)
Node v22.21.1, npm 10.x.
Risk & Scope
Main risk or tradeoff: the JS sandbox is
node:vm(zero dep), notisolated-vm.node:vmis not a true V8 isolate — six prototype-escape paths were identified and blocked over six review rounds (PoCs in the security test file act as regressions). Two architectural limitations remain documented and not fixed:node:vmdoes not enforce a memory cap — a hostile script can OOM the host. Operators should configure--max-old-space-size; future phases may move toworker_threads.EW6constant.Neither limitation is unique to qwen-code — claude-code 2.1.160 (binary inspected) ships the same trade-off.
Not validated / out of scope:
parallel/pipeline/workflow/budgetare throwing stubs in P1 (clear "scheduled for PN" errors, notReferenceError). Real implementations land in P2/P5 — injection seams inSandboxOptionsare pre-wired so neither phase needs to modify the sandbox.agent({schema})throws with a clear "scheduled for P3" message. Until P3 lands, scripts that rely onStructuredOutput(including upstream's/deep-research) will fail at the first such call — by design./workflowsslash command or progress UI yet. TheBackgroundTasksPilldoes not show workflow runs. All planned for P4.resumeFromRunId. Planned for P6.Breaking changes / migration notes: None. The tool is off by default (
isWorkflowsEnabled() === falseunless setting or env var is set). All changes are additive; no existing tool, slash command, or config field is modified.Decisions deferred to maintainer sign-off (listed in issue #4721 §"Decisions needed before P1"):
node:vm(P1) vsisolated-vm(potential P2+)workflowKeywordTriggerEnableddefault: false (qwen) vs true (upstream)QWEN_CODE_MAX_TOKENS_PER_WORKFLOWin P5.qwen/workflows/vs.claude/workflows/Resolves the implementation half of #4721 — only P1 lands here; P2–P7 will be separate PRs per the phased plan.