You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Port the Dynamic Workflows feature (announced by Anthropic in Claude Code 2.1.160) to qwen-code as a third tier of multi-agent execution, complementary to the existing /swarm tool (#3433) and the in-progress Agent Team (#2886).
What a dynamic workflow is
A model-authored JavaScript script that runs in a sandbox and orchestrates many subagents through a small set of primitives. The model writes the script on-the-fly for the user's request; the runtime sandboxes it; subagents fan out through the existing headless-agent path; one aggregated result returns to the main conversation.
The full API surface (all confirmed against upstream's published /deep-research workflow script and binary strings):
Concrete example from Anthropic's shipped /deep-research workflow:
phase('Scope')constscope=awaitagent('Decompose this research question into 5 search angles...',{schema: SCOPE_SCHEMA})constsearchResults=awaitpipeline(scope.angles,angle=>agent(SEARCH_PROMPT(angle),{phase: 'Search',schema: SEARCH_SCHEMA}),searchResult=>parallel(novelUrls(searchResult).map(src=>()=>agent(FETCH_PROMPT(src),{phase: 'Fetch',schema: EXTRACT_SCHEMA}))),)phase('Verify')constvoted=awaitparallel(rankedClaims.map(claim=>()=>parallel(Array.from({length: 3},(_,v)=>()=>agent(VERIFY_PROMPT(claim,v),{phase: 'Verify',schema: VERDICT_SCHEMA}))).then(verdicts=>({ ...claim,survives: verdicts.filter(v=>v.refuted).length<2}))))phase('Synthesize')constreport=awaitagent('Merge confirmed claims; write report...',{schema: REPORT_SCHEMA})return{ question, ...report}
Triggers
Four invocation paths matching upstream:
Keyword in prompt — including the word workflow in a single-turn user prompt opts that turn into a workflow.
/effort ultracode (session-only mode) — once enabled, the model auto-spawns workflows for substantive turns until session ends.
Saved slash command — workflow scripts at .qwen/workflows/ (project) or ~/.qwen/workflows/ (user) are surfaced as slash commands (the /deep-research invocation style).
Direct Workflow tool call by the model with inline script or name.
Hard caps (verbatim from upstream)
Concurrent agents: min(16, os.cpus().length - 2) per workflow
Total agent calls: 1000 per workflow lifetime
Schema-mismatch nudges: 2 in-conversation nudges per agent (binary line 307789 — error message reads "subagent completed without calling StructuredOutput (after 2 in-conversation nudges)"). Distinct from the stall-retry counter VOK = 5 which fires on no-progress agents, not schema validation failures.
Single nesting level for workflow() (workflow inside a child workflow throws)
Same-session resume only (in initial scope; cross-session is a v1.5 candidate, see decisions below)
Why is this needed?
What dynamic workflow does that swarm and Agent Team don't
Keep /swarm for the simple "fan out 5 tasks, aggregate" case where a script is overkill.
Keep Agent Team for persistent peer-to-peer collaboration with mailboxes.
Add Workflow for the rich multi-phase orchestration case — matching upstream's Agent + Workflow coexistence.
Use cases this unlocks
Adversarial verification (deep-research-style 3-vote claim refutation against schema-validated claims)
Map-reduce with typed outputs (extract findings as JSON objects, not freeform text)
Multi-stage pipelines without per-stage barriers (item A in stage 3 while item B is still in stage 1 — dramatically reduces wall-clock for long fan-outs)
Resumable long runs (pause/restart without re-burning the agents that already completed)
The only genuinely new subsystem is a node:vm-based JS sandbox (~150–250 LOC) that injects the documented globals and stubs Date.now / Math.random for resume determinism. Everything else is wiring.
Additional context
Local design artifacts
A full design pass has been completed against upstream @anthropic-ai/claude-code@2.1.160, with the API surface live-verified against the actual /deep-research workflow script captured from ~/.claude/projects/<session>/workflows/scripts/ plus binary strings cross-check. The following artifacts will be committed alongside the implementation PR:
Main design doc (788 lines): .qwen/design/dynamic-workflow-alignment-claude-code-2.1.160.md — per-axis upstream findings, qwen-code fit matrix (11 subsystems), gap analysis, phased plan, risks
Live-probe delta (~280 lines): .qwen/design/dynamic-workflow-alignment-claude-code-2.1.160-liveprobe-delta.md — API surface confirmation, with binary line citations and /deep-research source code references
E2E test plan (812 lines): .qwen/e2e-tests/dynamic-workflow-alignment.md — per-phase scenarios with stub-server harness
Every API signature in the design is Confirmed against either Anthropic's published /deep-research workflow source or the binary's literal tool-description constant.
Phased implementation plan
Each phase is independently shippable behind a feature gate:
agent({ schema, agentType }) → forced StructuredOutput contract + 2-nudge in-conversation retry on schema validation failure; agentType resolves against the declarative-agents registry from #4821 (graceful fallback to the built-in workflow subagent if agentType is unset or fails to resolve)
~300
P4
Extract meta ({name, description, whenToUse?, phases?[{title, detail?, model?}]}) before stripping it from the script source (replaces P1's stripExportMeta with extractAndStripMeta); /workflows slash command + phase-tree progress UI + BackgroundTasksPillKIND_NAMES extension
Decisions needed before P1 (qwen-specific divergences from upstream)
These need maintainer sign-off because they leak into settings shape, env var names, and UI surface:
JS sandbox choice: node:vm (zero dep, weak isolation; matches upstream's defense-in-depth posture, since the script has no fs/shell surface by design) vs isolated-vm (strong V8-level isolate, but adds a native dep that breaks the "pure JS, single npm install" property and requires prebuilt binaries for all platforms). Recommendation: node:vm for v1, escalate to isolated-vm only if a hostile-script threat model emerges.
Keyword trigger default: upstream sets workflowKeywordTriggerEnabled = true. qwen-code users skew cost-sensitive across DashScope / OpenAI-compatible providers. Recommendation: default false on qwen-code, require explicit opt-in via settings. Diverges from upstream.
Per-run token ceiling: upstream has only agent-count caps (16 concurrent / 1000 total) — no programmable token limit. Recommendation: add a qwen-only QWEN_CODE_MAX_TOKENS_PER_WORKFLOW env var, default unset, as a safety net. Diverges from upstream.
Saved-workflows directory: .qwen/workflows/ + ~/.qwen/workflows/ (matches qwen convention) vs .claude/workflows/ + ~/.claude/workflows/ (matches upstream literal paths and aids portability of shared workflows across tools). Recommendation: .qwen/ paths; copy-pasted upstream workflows need a path adjustment.
Cross-session resume: upstream is strictly same-session only. qwen-code already ships cross-session background-agent resume (Add background agent resume and continuation #3739, 1068 LOC). Recommendation: ship same-session in v1 to match upstream, extend to cross-session in v1.5 as a qwen-only improvement.
Ultracode persistence semantics: upstream's ultracode: true is session-only (does not persist across sessions). qwen-code's settings layer has no "session-only key" concept today. Recommendation: match upstream — require re-toggle per session, document it.
Risks
JS sandbox security — node:vm is not a true security boundary. Mitigated by the fact that the script has no fs/shell surface by design (only spawned agents do I/O); we enforce this by not injecting process / require / fs / child_process into the context. Escalation path to isolated-vm if a real attacker model emerges. Workflow scripts are model-authored, not arbitrary user input.
Token cost amplification — 16× concurrency × deep nesting × shared plan billing can burn quota fast. Mitigated by agent-count caps (16 / 1000), optional QWEN_CODE_MAX_TOKENS_PER_WORKFLOW ceiling, one-time consent banner via skipWorkflowUsageWarning setting, and the keyword-trigger default flip.
Subagent state leakage — subagents share Config / ToolRegistry. Concurrent agents could leak through mutable per-call state in custom MCP servers. Mitigated by auditing Config for mutable per-call state and recommending isolation: 'worktree' for workflows that mutate files concurrently.
Known P1 limitations (deferred to later phases)
Surfaced during PR #4732 R7 review by @DragonnZhang; documented here so they don't get re-raised in subsequent rounds:
In-script async microtask leak after wall-clock timeout — once an in-script async loop (e.g., (async () => { while(true) await 0 })()) starts inside the node:vm context, the wall-clock Promise.race rejects user-side but the microtask loop continues consuming host microtasks. node:vm provides no mechanism to halt async execution once started. Mitigated by the 30-min default cap (QWEN_CODE_MAX_WORKFLOW_SECONDS) and the opt-in feature gate. Proper fix: migrate the sandbox to worker_threads isolation in a future phase, where worker termination drops all in-flight microtasks.
No memory cap on the vm context — node:vm does not enforce a memory limit, so a script like const a=[]; while(true) a.push(new ArrayBuffer(1e8)) can OOM the host process. Operator mitigation: --max-old-space-size flag on the parent Node process. Same proper fix as (1): worker_threads isolation gives the worker its own heap with a resourceLimits.maxOldGenerationSizeMb cap.
Both are acceptable for P1 given the opt-in gate, ask permission level, and 30-min wall-clock backstop — but should be addressed alongside any future phase that loosens the gating (e.g., P7 keyword trigger / ultracode session-mode would broaden the activation surface and make stricter sandbox isolation more important).
tools / disallowedTools / model / effort / permissionMode from the agent definition compose with workflow's default subagent config. P1's hardcoded disallowedTools: [SEND_MESSAGE, EXIT_PLAN_MODE] becomes the floor that agent-level definitions extend (union), not replace
isolation: 'worktree' at the agent-definition level is the default; opts.isolation at the workflow call site overrides per-call
Captures actual shipped state and the deltas found in the Claude Code 2.1.168 binary scan. The original plan, decisions, and risks above remain authoritative for the unshipped phases; this section adds (i) what actually shipped vs the original LOC/scope estimate, (ii) what changed upstream between 2.1.160 and 2.1.168, and (iii) confirmed adjacent-infrastructure reuse paths for P3–P7.
On plan, 0 missing features. Tests = 668 LOC (57% of total).
LOC over-run vs the original ~600 / ~300 estimates is dominated by test coverage and review-round security hardening; no unplanned feature shipped. The "extras" below are all positive drift discovered during review.
P1 positive drift (beyond original plan):
30-min async wall-clock timeout (QWEN_CODE_MAX_WORKFLOW_SECONDS, default 1800s) — catches 0-token hangs that vm timeout and the future budget cap cannot reach (T23 R2)
AbortSignal threading into subagent.execute so wall-clock abort propagates into in-flight subagents (T40 R4)
abortOnTimeout child-controller injection seam for explicit timeout coordination (T40 R4)
P2 positive drift (beyond original plan):
Hard ceilings on env-overridable caps: HARD_MAX_AGENTS_PER_RUN_CEILING = 10000, HARD_MAX_CONCURRENCY_CEILING = 64 — prevents fat-finger misconfiguration uncapping a runaway workflow (R1 wenshao T4)
Per-element vm-realm JSON revival of parallel/pipeline results instead of whole-array — closes T1/T8/T14 escape: a single non-serializable thunk result no longer wipes out sibling results (R1 self-review EAD-1)
Dispatch-layer concurrency throttling (not thunk-layer) — prevents nested parallel-in-pipeline deadlock (the canonical /deep-research shape); verified with a gate-based RED test on concurrency=1 (F1 fix)
Observability: debugLogger.warn for rejected thunks (R1), logRevivalFailure hook for non-serializable results (R2) — disambiguates "null at index" failure modes
Limiter prompt-queue abort listener — strengthens limiter invariant when an in-flight thunk hangs (R2)
Shipped env vars and caps
Env var
Default
Hard ceiling
Notes
QWEN_CODE_ENABLE_WORKFLOWS
unset (off)
n/a
'1' to enable (or enableWorkflows: true setting)
QWEN_CODE_DISABLE_WORKFLOWS
unset
n/a
'1' is a force-disable kill switch
QWEN_CODE_MAX_WORKFLOW_CONCURRENCY
max(1, min(16, cpus-2))
64
sliding window per run
QWEN_CODE_MAX_WORKFLOW_AGENTS
1000
10000
total agent() calls per run
QWEN_CODE_MAX_WORKFLOW_SECONDS
1800 (30 min)
n/a
wall-clock per run
P3+ injection seams already pre-wired in P1/P2
agent({schema}), agent({agentType}), agent({isolation}), agent({model}) are all STUB-THROW today — P3 replaces the throw with the real implementation, no sandbox re-opening needed
SandboxOptions.budget interface is wired (default spent()/remaining() throw) — P5 injects the real implementation through the existing seam
SandboxOptions.parallel / SandboxOptions.pipeline are populated (P2)
resumeFromRunId + JSONL journal: NOT pre-wired — P6 is a net-new subsystem
Ultracode session mode + workflowKeywordTriggerEnabled: NOT pre-wired — P7 is net-new
Claude Code 2.1.168 reverse pass — deltas vs 2.1.160 baseline
Binary strings cross-compared across 2.1.161 / 2.1.162 / 2.1.168 against the 2.1.160-documented baseline at the top of this issue.
NEW: not in 2.1.160 baseline; candidate for a post-P7 follow-up
Verdict: Workflow surface is stable across 2.1.160 → 2.1.168 for everything #4721 covers. Upstream has already shipped P3 / P5 / P6 (only isolation:'remote' is gated off), giving qwen a clear contract to match. Agent-memory scoping by agentType is new in 2.1.168 — out of #4721's original scope, candidate for a post-P7 follow-up if there's appetite.
Adjacent infrastructure reuse confirmed on origin/main
Phase
Existing subsystem
Reuse path
Estimated savings
P3 schema + agentType
SubagentManager + agent-frontmatter-schema.ts (#4842 + #4996 already on main, CC 2.1.168 parity)
Agent Team (#4844) integration risk = LOW: Team and Workflow share the same SubagentManager registry so agentType semantics align, but their call sites (team.spawn() vs workflow.agent()) don't overlap. Documentation will need to disambiguate "which to use when".
Refined phase plan (no scope drift, refined LOC estimates)
The plan above remains the authoritative scope; this is a refined estimate based on the confirmed reuse paths. P3 ships as a single PR covering schema + agentType + isolation:'worktree' rather than splitting — the model-facing API and the sandbox-execution wiring are easier to review together than as two coupled PRs.
Phase
Refined LOC est. (src + tests)
Net-new subsystems
Notes
P3 schema + agentType + isolation:'worktree'
~1200-1500
none (all reuse)
Wires existing SubagentManager, existing worktree subsystem, the new StructuredOutput contract
What would you like to be added?
Port the Dynamic Workflows feature (announced by Anthropic in Claude Code 2.1.160) to qwen-code as a third tier of multi-agent execution, complementary to the existing
/swarmtool (#3433) and the in-progress Agent Team (#2886).What a dynamic workflow is
A model-authored JavaScript script that runs in a sandbox and orchestrates many subagents through a small set of primitives. The model writes the script on-the-fly for the user's request; the runtime sandboxes it; subagents fan out through the existing headless-agent path; one aggregated result returns to the main conversation.
The full API surface (all confirmed against upstream's published
/deep-researchworkflow script and binary strings):Concrete example from Anthropic's shipped
/deep-researchworkflow:Triggers
Four invocation paths matching upstream:
workflowin a single-turn user prompt opts that turn into a workflow./effort ultracode(session-only mode) — once enabled, the model auto-spawns workflows for substantive turns until session ends..qwen/workflows/(project) or~/.qwen/workflows/(user) are surfaced as slash commands (the/deep-researchinvocation style).Workflowtool call by the model with inlinescriptorname.Hard caps (verbatim from upstream)
min(16, os.cpus().length - 2)per workflow1000per workflow lifetimeVOK = 5which fires on no-progress agents, not schema validation failures.workflow()(workflow inside a child workflow throws)Why is this needed?
What dynamic workflow does that swarm and Agent Team don't
/swarm(shipped, #3433)tasks[]arrayphase()pipeline()agent({ schema })budget.total/spent/remainingworkflow(name, args)Dynamic workflow is strictly additive:
/swarmfor the simple "fan out 5 tasks, aggregate" case where a script is overkill.Workflowfor the rich multi-phase orchestration case — matching upstream'sAgent+Workflowcoexistence.Use cases this unlocks
while (budget.remaining() > 50_000) { spawn more verifiers })Why the infrastructure cost is small
The implementation reuses most of what qwen-code already ships:
packages/core/src/agents/runtime/agent-headless.tsagent()→AgentHeadless.create()<task-notification>packages/core/src/agents/background-tasks.ts'workflow'toTaskKindSendMessageType.NotificationBackgroundTasksPill/ dialog / live panelKIND_NAMESwith'workflow'UiTelemetryService.bySourcephase:labelQWEN_CODE_MAX_BACKGROUND_AGENTSConfig.getToolRegistry()packages/cli/src/services/BuiltinCommandLoader.ts/workflows,/effortWORKFLOW_SUBAGENT_PROMPTconstantThe only genuinely new subsystem is a
node:vm-based JS sandbox (~150–250 LOC) that injects the documented globals and stubsDate.now / Math.randomfor resume determinism. Everything else is wiring.Additional context
Local design artifacts
A full design pass has been completed against upstream
@anthropic-ai/claude-code@2.1.160, with the API surface live-verified against the actual/deep-researchworkflow script captured from~/.claude/projects/<session>/workflows/scripts/plus binary strings cross-check. The following artifacts will be committed alongside the implementation PR:.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160.md— per-axis upstream findings, qwen-code fit matrix (11 subsystems), gap analysis, phased plan, risks.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160-liveprobe-delta.md— API surface confirmation, with binary line citations and/deep-researchsource code references.qwen/e2e-tests/dynamic-workflow-alignment.md— per-phase scenarios with stub-server harnessEvery API signature in the design is Confirmed against either Anthropic's published
/deep-researchworkflow source or the binary's literal tool-description constant.Phased implementation plan
Each phase is independently shippable behind a feature gate:
Workflowtool:node:vmsandbox + sequentialagent()+phase()+log(); foreground; no parallel/pipeline/schema/budget/resumeparallel(thunks)+pipeline(items, ...stages)+ 16-concurrent / 1000-total caps + errors-as-dataagent({ schema, agentType })→ forcedStructuredOutputcontract + 2-nudge in-conversation retry on schema validation failure;agentTyperesolves against the declarative-agents registry from #4821 (graceful fallback to the built-in workflow subagent ifagentTypeis unset or fails to resolve)meta({name, description, whenToUse?, phases?[{title, detail?, model?}]}) before stripping it from the script source (replaces P1'sstripExportMetawithextractAndStripMeta);/workflowsslash command + phase-tree progress UI +BackgroundTasksPillKIND_NAMESextensionbudgetglobal + per-phase token rollup + optional per-run token ceiling<projectDir>/workflows/<sessionId>/workflowKeywordTriggerEnabledkeyword triggerDecisions needed before P1 (qwen-specific divergences from upstream)
These need maintainer sign-off because they leak into settings shape, env var names, and UI surface:
JS sandbox choice:
node:vm(zero dep, weak isolation; matches upstream's defense-in-depth posture, since the script has no fs/shell surface by design) vsisolated-vm(strong V8-level isolate, but adds a native dep that breaks the "pure JS, singlenpm install" property and requires prebuilt binaries for all platforms). Recommendation:node:vmfor v1, escalate toisolated-vmonly if a hostile-script threat model emerges.Keyword trigger default: upstream sets
workflowKeywordTriggerEnabled = true. qwen-code users skew cost-sensitive across DashScope / OpenAI-compatible providers. Recommendation: defaultfalseon qwen-code, require explicit opt-in via settings. Diverges from upstream.Per-run token ceiling: upstream has only agent-count caps (16 concurrent / 1000 total) — no programmable token limit. Recommendation: add a qwen-only
QWEN_CODE_MAX_TOKENS_PER_WORKFLOWenv var, default unset, as a safety net. Diverges from upstream.Saved-workflows directory:
.qwen/workflows/+~/.qwen/workflows/(matches qwen convention) vs.claude/workflows/+~/.claude/workflows/(matches upstream literal paths and aids portability of shared workflows across tools). Recommendation:.qwen/paths; copy-pasted upstream workflows need a path adjustment.Cross-session resume: upstream is strictly same-session only. qwen-code already ships cross-session background-agent resume (Add background agent resume and continuation #3739, 1068 LOC). Recommendation: ship same-session in v1 to match upstream, extend to cross-session in v1.5 as a qwen-only improvement.
Ultracode persistence semantics: upstream's
ultracode: trueis session-only (does not persist across sessions). qwen-code's settings layer has no "session-only key" concept today. Recommendation: match upstream — require re-toggle per session, document it.Risks
JS sandbox security —
node:vmis not a true security boundary. Mitigated by the fact that the script has no fs/shell surface by design (only spawned agents do I/O); we enforce this by not injectingprocess/require/fs/child_processinto the context. Escalation path toisolated-vmif a real attacker model emerges. Workflow scripts are model-authored, not arbitrary user input.Token cost amplification — 16× concurrency × deep nesting × shared plan billing can burn quota fast. Mitigated by agent-count caps (16 / 1000), optional
QWEN_CODE_MAX_TOKENS_PER_WORKFLOWceiling, one-time consent banner viaskipWorkflowUsageWarningsetting, and the keyword-trigger default flip.Subagent state leakage — subagents share
Config/ToolRegistry. Concurrent agents could leak through mutable per-call state in custom MCP servers. Mitigated by auditingConfigfor mutable per-call state and recommendingisolation: 'worktree'for workflows that mutate files concurrently.Known P1 limitations (deferred to later phases)
Surfaced during PR #4732 R7 review by @DragonnZhang; documented here so they don't get re-raised in subsequent rounds:
In-script async microtask leak after wall-clock timeout — once an in-script async loop (e.g.,
(async () => { while(true) await 0 })()) starts inside thenode:vmcontext, the wall-clockPromise.racerejects user-side but the microtask loop continues consuming host microtasks.node:vmprovides no mechanism to halt async execution once started. Mitigated by the 30-min default cap (QWEN_CODE_MAX_WORKFLOW_SECONDS) and the opt-in feature gate. Proper fix: migrate the sandbox toworker_threadsisolation in a future phase, where worker termination drops all in-flight microtasks.No memory cap on the vm context —
node:vmdoes not enforce a memory limit, so a script likeconst a=[]; while(true) a.push(new ArrayBuffer(1e8))can OOM the host process. Operator mitigation:--max-old-space-sizeflag on the parent Node process. Same proper fix as (1):worker_threadsisolation gives the worker its own heap with aresourceLimits.maxOldGenerationSizeMbcap.Both are acceptable for P1 given the opt-in gate,
askpermission level, and 30-min wall-clock backstop — but should be addressed alongside any future phase that loosens the gating (e.g., P7 keyword trigger / ultracode session-mode would broaden the activation surface and make stricter sandbox isolation more important).Relation to existing features (not replaced)
/swarm(feat(core): add dynamic swarm worker tool #3433) — keep as-is. Targets the "fan out N tasks, aggregate" case where a workflow script is overkill.workflowbecomes the 5thkindconsumer (agent/shell/monitor/dream/workflow), no framework change.Related upstream ports (coordinate with)
.qwen/agents/<name>.mdfrontmatter). The workflow'sagent(prompt, { agentType })option resolvesagentTypeagainst the same registry that feat(agents): support declarative agent definitions via frontmatter files #4821 builds. Joint design surface:name(feat(agents): support declarative agent definitions via frontmatter files #4821 frontmatter) ↔agentType(workflow opts) — the same string keytools/disallowedTools/model/effort/permissionModefrom the agent definition compose with workflow's default subagent config. P1's hardcodeddisallowedTools: [SEND_MESSAGE, EXIT_PLAN_MODE]becomes the floor that agent-level definitions extend (union), not replaceisolation: 'worktree'at the agent-definition level is the default;opts.isolationat the workflow call site overrides per-callagentTypesupport lands in P3, gated on feat(agents): support declarative agent definitions via frontmatter files #4821's registry being available (or graceful fallback when the named type doesn't resolve)Upstream references
@anthropic-ai/claude-code-darwin-arm64@2.1.160binary + on-disk generated workflow scripts at~/.claude/projects/<session>/workflows/scripts/Acceptance criteria
enableWorkflowssetting is on; disabled by default in v1 — P1 feat(core): Workflow tool P1 — minimal node:vm sandbox + sequential agent() (#4721) #4732Workflow({ script, args })and the script runs in anode:vmsandbox with the documented globals — P1 feat(core): Workflow tool P1 — minimal node:vm sandbox + sequential agent() (#4721) #4732phase(),agent(),log()work sequentially (P1) — P1 feat(core): Workflow tool P1 — minimal node:vm sandbox + sequential agent() (#4721) #4732parallel()/pipeline()run concurrently with cap enforcement and errors-as-data null semantics (P2) — P2 feat(core): Workflow P2 — parallel() + pipeline() concurrent fan-out (#4721) #4947agent({ schema })returns validated objects, retries on mismatch, returnsnullon skip (P3)agent({ agentType })resolves against the declarative-agents registry (feat(agents): support declarative agent definitions via frontmatter files #4821) when present; falls back to the built-in workflow subagent prompt whenagentTypeis unset or doesn't resolve (P3)/workflowsslash command lists active and completed runs; pill shows workflow count (P4)budget.total / spent() / remaining()reflect output-token usage; hard ceiling throwsWorkflowBudgetExceededError(P5)resumeFromRunIdreturns cached prefix instantly; first edited or newagent()runs live (P6)/swarmandAgent Teamfeatures remain unchanged; no regression in their test suites — confirmed at each PRUpdate — 2026-06-12: P1 + P2 shipped, 2.1.168 reverse pass
Captures actual shipped state and the deltas found in the Claude Code 2.1.168 binary scan. The original plan, decisions, and risks above remain authoritative for the unshipped phases; this section adds (i) what actually shipped vs the original LOC/scope estimate, (ii) what changed upstream between 2.1.160 and 2.1.168, and (iii) confirmed adjacent-infrastructure reuse paths for P3–P7.
Shipped phases
LOC over-run vs the original ~600 / ~300 estimates is dominated by test coverage and review-round security hardening; no unplanned feature shipped. The "extras" below are all positive drift discovered during review.
P1 positive drift (beyond original plan):
QWEN_CODE_MAX_WORKFLOW_SECONDS, default 1800s) — catches 0-token hangs that vm timeout and the future budget cap cannot reach (T23 R2)AbortSignalthreading intosubagent.executeso wall-clock abort propagates into in-flight subagents (T40 R4)abortOnTimeoutchild-controller injection seam for explicit timeout coordination (T40 R4)P2 positive drift (beyond original plan):
HARD_MAX_AGENTS_PER_RUN_CEILING = 10000,HARD_MAX_CONCURRENCY_CEILING = 64— prevents fat-finger misconfiguration uncapping a runaway workflow (R1 wenshao T4)parallel/pipelineresults instead of whole-array — closes T1/T8/T14 escape: a single non-serializable thunk result no longer wipes out sibling results (R1 self-review EAD-1)parallel-in-pipelinedeadlock (the canonical/deep-researchshape); verified with a gate-based RED test onconcurrency=1(F1 fix)debugLogger.warnfor rejected thunks (R1),logRevivalFailurehook for non-serializable results (R2) — disambiguates "null at index" failure modesShipped env vars and caps
QWEN_CODE_ENABLE_WORKFLOWS'1'to enable (orenableWorkflows: truesetting)QWEN_CODE_DISABLE_WORKFLOWS'1'is a force-disable kill switchQWEN_CODE_MAX_WORKFLOW_CONCURRENCYmax(1, min(16, cpus-2))QWEN_CODE_MAX_WORKFLOW_AGENTSagent()calls per runQWEN_CODE_MAX_WORKFLOW_SECONDSP3+ injection seams already pre-wired in P1/P2
agent({schema}),agent({agentType}),agent({isolation}),agent({model})are all STUB-THROW today — P3 replaces the throw with the real implementation, no sandbox re-opening neededSandboxOptions.budgetinterface is wired (defaultspent()/remaining()throw) — P5 injects the real implementation through the existing seamSandboxOptions.parallel/SandboxOptions.pipelineare populated (P2)resumeFromRunId+ JSONL journal: NOT pre-wired — P6 is a net-new subsystemworkflowKeywordTriggerEnabled: NOT pre-wired — P7 is net-newClaude Code 2.1.168 reverse pass — deltas vs 2.1.160 baseline
Binary
stringscross-compared across 2.1.161 / 2.1.162 / 2.1.168 against the 2.1.160-documented baseline at the top of this issue.Unchanged: concurrent cap
max(1, min(16, cpus-2)), per-run agent cap1000, schema nudge count2 in-conversation, wall-clock30 min, single-level workflow nesting.New upstream features (post-2.1.160) confirmed shipped:
agent({schema})enforcement"subagent completed without calling StructuredOutput (after 2 in-conversation nudges)"agent({agentType})"agent({agentType}): agent type '{agentType}' not found"agent({isolation:'worktree'})agent({isolation:'remote'})"agent({isolation:'remote'}) is not available in this build"tengu_workflow_budget_cap_exceededtengu_workflow_journal_started_hit_respawnScope for auto-loading agent memory files. 'user' - ~/.claude/agent-memory/<agentType>/, 'project' - .claude/agent-memory/<agentType>/Verdict: Workflow surface is stable across 2.1.160 → 2.1.168 for everything #4721 covers. Upstream has already shipped P3 / P5 / P6 (only
isolation:'remote'is gated off), giving qwen a clear contract to match. Agent-memory scoping byagentTypeis new in 2.1.168 — out of #4721's original scope, candidate for a post-P7 follow-up if there's appetite.Adjacent infrastructure reuse confirmed on origin/main
SubagentManager+agent-frontmatter-schema.ts(#4842 + #4996 already on main, CC 2.1.168 parity)await config.getSubagentManager().findSubagentByName(agentType, 'project')returnsSubagentConfigcarryingpermissionMode,maxTurns,mcpServers,hooks,tools,disallowedTools,color; merge into workflow's hardcoded disallow flooragent({isolation:'worktree'})to spawn subagent in a fresh worktree; reuse cleanup helpersBackgroundTasksPill+BuiltinCommandLoaderTaskKindunion with'workflow', append toKIND_NAMESmap, registerworkflowsCommandmirroringtasksCommand/hookCommandpatternsAgentStatisticsper-agent token trackerWorkflowBudgetImpl.spent()sums per-phase agent tokens; populate the existing P1SandboxOptions.budgetseamjsonl-utils(writeLine/readLines/countLines) +FileHistoryServiceserialization pattern (#4897)WorkflowStateRecordJSONL type; append on each phase completion; on resume, longest-prefix-matched replayconfig.tssettings layerworkflowKeywordTriggerEnabledmirroringagentTeamEnabled/forkSubagentEnabledpatternsAgent Team (#4844) integration risk = LOW: Team and Workflow share the same
SubagentManagerregistry soagentTypesemantics align, but their call sites (team.spawn()vsworkflow.agent()) don't overlap. Documentation will need to disambiguate "which to use when".Refined phase plan (no scope drift, refined LOC estimates)
The plan above remains the authoritative scope; this is a refined estimate based on the confirmed reuse paths. P3 ships as a single PR covering
schema + agentType + isolation:'worktree'rather than splitting — the model-facing API and the sandbox-execution wiring are easier to review together than as two coupled PRs.SubagentManager, existing worktree subsystem, the newStructuredOutputcontractBackgroundTasksPill/BuiltinCommandLoaderSandboxOptions.budgetseamWorkflowStateRecordJSONL typejsonl-utilsRemaining total ≈ 3000-3950 LOC across 5 PRs (average ~600/PR vs P1+P2 average ~2100/PR — easier reviews).