Feature Request: Port Dynamic Workflows / Ultracode from Claude Code 2.1.160

## What would you like to be added?

Port the **Dynamic Workflows** feature (announced by Anthropic in [Claude Code 2.1.160](https://claude.com/blog/introducing-dynamic-workflows-in-claude-code)) to qwen-code as a third tier of multi-agent execution, complementary to the existing `/swarm` tool (#3433) and the in-progress Agent Team (#2886).

### What a dynamic workflow is

A **model-authored JavaScript script** that runs in a sandbox and orchestrates many subagents through a small set of primitives. The model writes the script on-the-fly for the user's request; the runtime sandboxes it; subagents fan out through the existing headless-agent path; one aggregated result returns to the main conversation.

The full API surface (all confirmed against upstream's published `/deep-research` workflow script and binary strings):

```ts
// Required first statement of every script
export const meta = {
  name: string,
  description: string,
  whenToUse?: string,
  phases?: Array<{ title: string, detail?: string, model?: string }>,
}

// Injected globals
phase(title: string): void
parallel(thunks: Array<() => Promise<T>>): Promise<Array<T | null>>
pipeline<T>(items: T[], ...stages: Array<(prev, item: T, idx: number) => Promise<any>>): Promise<any[]>
agent(prompt: string, opts?: {
  label?: string, phase?: string, schema?: object,
  model?: string, isolation?: 'worktree' | 'remote', agentType?: string,
}): Promise<any>
log(message: string): void
workflow(nameOrRef: string | { scriptPath: string }, args?: any): Promise<any>

args: any
budget: { total: number | null, spent(): number, remaining(): number }

// Stubbed (throw) to guarantee resume determinism
Date.now(), new Date(), Math.random()
```

Concrete example from Anthropic's shipped `/deep-research` workflow:

```js
phase('Scope')
const scope = await agent('Decompose this research question into 5 search angles...', { schema: SCOPE_SCHEMA })

const searchResults = await pipeline(
  scope.angles,
  angle => agent(SEARCH_PROMPT(angle), { phase: 'Search', schema: SEARCH_SCHEMA }),
  searchResult => parallel(novelUrls(searchResult).map(src => () =>
    agent(FETCH_PROMPT(src), { phase: 'Fetch', schema: EXTRACT_SCHEMA }))),
)

phase('Verify')
const voted = await parallel(rankedClaims.map(claim => () =>
  parallel(Array.from({ length: 3 }, (_, v) => () =>
    agent(VERIFY_PROMPT(claim, v), { phase: 'Verify', schema: VERDICT_SCHEMA })))
    .then(verdicts => ({ ...claim, survives: verdicts.filter(v => v.refuted).length < 2 }))))

phase('Synthesize')
const report = await agent('Merge confirmed claims; write report...', { schema: REPORT_SCHEMA })
return { question, ...report }
```

### Triggers

Four invocation paths matching upstream:

1. **Keyword in prompt** — including the word `workflow` in a single-turn user prompt opts that turn into a workflow.
2. **`/effort ultracode`** (session-only mode) — once enabled, the model auto-spawns workflows for substantive turns until session ends.
3. **Saved slash command** — workflow scripts at `.qwen/workflows/` (project) or `~/.qwen/workflows/` (user) are surfaced as slash commands (the `/deep-research` invocation style).
4. **Direct `Workflow` tool call** by the model with inline `script` or `name`.

### Hard caps (verbatim from upstream)

- Concurrent agents: `min(16, os.cpus().length - 2)` per workflow
- Total agent calls: `1000` per workflow lifetime
- Schema-mismatch nudges: **2 in-conversation nudges** per agent (binary line 307789 — error message reads "subagent completed without calling StructuredOutput (after 2 in-conversation nudges)"). Distinct from the stall-retry counter `VOK = 5` which fires on no-progress agents, not schema validation failures.
- Single nesting level for `workflow()` (workflow inside a child workflow throws)
- Same-session resume only (in initial scope; cross-session is a v1.5 candidate, see decisions below)

## Why is this needed?

### What dynamic workflow does that swarm and Agent Team don't

| Capability | `/swarm` (shipped, #3433) | Agent Team (PR #2886) | Dynamic Workflow (this proposal) |
|---|---|---|---|
| Programming model | Declarative `tasks[]` array | Imperative team / mailbox API | Imperative JS script |
| Multi-phase orchestration | ❌ single shot | ✅ via task board state | ✅ via `phase()` |
| Pipeline (staggered, non-barrier) across stages | ❌ | ❌ | ✅ via `pipeline()` |
| Structured-output / schema-validated agents | ❌ | ❌ | ✅ via `agent({ schema })` |
| Resume / cached-prefix re-entry | ❌ | ❌ | ✅ longest-unchanged-prefix |
| Programmable token budget | ❌ | ❌ | ✅ `budget.total/spent/remaining` |
| Nested / saved workflows | ❌ | ❌ | ✅ via `workflow(name, args)` |
| Inter-agent communication | ❌ | ✅ peer-to-peer mailbox | ❌ results return to script |
| Lifecycle | Ephemeral | Persistent collaboration | Ephemeral per-script |

Dynamic workflow is **strictly additive**:

- Keep `/swarm` for the simple "fan out 5 tasks, aggregate" case where a script is overkill.
- Keep Agent Team for persistent peer-to-peer collaboration with mailboxes.
- Add `Workflow` for the rich multi-phase orchestration case — matching upstream's `Agent` + `Workflow` coexistence.

### Use cases this unlocks

- **Adversarial verification** (deep-research-style 3-vote claim refutation against schema-validated claims)
- **Map-reduce with typed outputs** (extract findings as JSON objects, not freeform text)
- **Multi-stage pipelines without per-stage barriers** (item A in stage 3 while item B is still in stage 1 — dramatically reduces wall-clock for long fan-outs)
- **Resumable long runs** (pause/restart without re-burning the agents that already completed)
- **Cost-bounded loops** (`while (budget.remaining() > 50_000) { spawn more verifiers }`)

### Why the infrastructure cost is small

The implementation reuses most of what qwen-code already ships:

| Subsystem | Existing PR / file | Reuse path |
|---|---|---|
| Headless subagent dispatch | #3076 #3970, `packages/core/src/agents/runtime/agent-headless.ts` | Each `agent()` → `AgentHeadless.create()` |
| Background-task envelope + `<task-notification>` | #3471 #3488 #3739, `packages/core/src/agents/background-tasks.ts` | Add `'workflow'` to `TaskKind` |
| Notification routing into main loop | #3471, `SendMessageType.Notification` | Zero change |
| `BackgroundTasksPill` / dialog / live panel | #3488 #3768 #4477 | Extend `KIND_NAMES` with `'workflow'` |
| Per-source token aggregation | `UiTelemetryService.bySource` | Key by `phase:label` |
| Background-agent resume (1068 LOC, battle-tested) | #3739 | Reusable for cross-session resume v1.5 |
| Concurrency cap pattern | #4324, `QWEN_CODE_MAX_BACKGROUND_AGENTS` | Workflow-scoped variant |
| MCP propagation to subagents | shared `Config.getToolRegistry()` | Zero change |
| Slash command registration | `packages/cli/src/services/BuiltinCommandLoader.ts` | Append `/workflows`, `/effort` |
| Subagent system-prompt template | existing built-in agents | New `WORKFLOW_SUBAGENT_PROMPT` constant |

**The only genuinely new subsystem** is a `node:vm`-based JS sandbox (~150–250 LOC) that injects the documented globals and stubs `Date.now / Math.random` for resume determinism. Everything else is wiring.

## Additional context

### Local design artifacts

A full design pass has been completed against upstream `@anthropic-ai/claude-code@2.1.160`, with the API surface live-verified against the actual `/deep-research` workflow script captured from `~/.claude/projects/<session>/workflows/scripts/` plus binary strings cross-check. The following artifacts will be committed alongside the implementation PR:

- **Main design doc** (788 lines): `.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160.md` — per-axis upstream findings, qwen-code fit matrix (11 subsystems), gap analysis, phased plan, risks
- **Live-probe delta** (~280 lines): `.qwen/design/dynamic-workflow-alignment-claude-code-2.1.160-liveprobe-delta.md` — API surface confirmation, with binary line citations and `/deep-research` source code references
- **E2E test plan** (812 lines): `.qwen/e2e-tests/dynamic-workflow-alignment.md` — per-phase scenarios with stub-server harness

Every API signature in the design is **Confirmed** against either Anthropic's published `/deep-research` workflow source or the binary's literal tool-description constant.

### Phased implementation plan

Each phase is independently shippable behind a feature gate:

| Phase | Scope | Est. LOC |
|---|---|---|
| **P1** | Minimal `Workflow` tool: `node:vm` sandbox + sequential `agent()` + `phase()` + `log()`; foreground; no parallel/pipeline/schema/budget/resume | ~600 |
| **P2** | `parallel(thunks)` + `pipeline(items, ...stages)` + 16-concurrent / 1000-total caps + errors-as-data | ~300 |
| **P3** | `agent({ schema, agentType })` → forced `StructuredOutput` contract + 2-nudge in-conversation retry on schema validation failure; `agentType` resolves against the declarative-agents registry from #4821 (graceful fallback to the built-in workflow subagent if `agentType` is unset or fails to resolve) | ~300 |
| **P4** | Extract `meta` (`{name, description, whenToUse?, phases?[{title, detail?, model?}]}`) before stripping it from the script source (replaces P1's `stripExportMeta` with `extractAndStripMeta`); `/workflows` slash command + phase-tree progress UI + `BackgroundTasksPill` `KIND_NAMES` extension | ~400 |
| **P5** | `budget` global + per-phase token rollup + optional per-run token ceiling | ~200 |
| **P6** | Resume via longest-unchanged-prefix cache; JSONL journal under `<projectDir>/workflows/<sessionId>/` | ~400 |
| **P7** (optional) | Ultracode session-mode toggle + `workflowKeywordTriggerEnabled` keyword trigger | ~200 |

### Decisions needed before P1 (qwen-specific divergences from upstream)

These need maintainer sign-off because they leak into settings shape, env var names, and UI surface:

1. **JS sandbox choice**: `node:vm` (zero dep, weak isolation; matches upstream's defense-in-depth posture, since the script has no fs/shell surface by design) vs `isolated-vm` (strong V8-level isolate, but adds a native dep that breaks the "pure JS, single `npm install`" property and requires prebuilt binaries for all platforms). Recommendation: `node:vm` for v1, escalate to `isolated-vm` only if a hostile-script threat model emerges.

2. **Keyword trigger default**: upstream sets `workflowKeywordTriggerEnabled = true`. qwen-code users skew cost-sensitive across DashScope / OpenAI-compatible providers. Recommendation: **default `false` on qwen-code**, require explicit opt-in via settings. Diverges from upstream.

3. **Per-run token ceiling**: upstream has only agent-count caps (16 concurrent / 1000 total) — no programmable token limit. Recommendation: add a qwen-only `QWEN_CODE_MAX_TOKENS_PER_WORKFLOW` env var, default unset, as a safety net. Diverges from upstream.

4. **Saved-workflows directory**: `.qwen/workflows/` + `~/.qwen/workflows/` (matches qwen convention) vs `.claude/workflows/` + `~/.claude/workflows/` (matches upstream literal paths and aids portability of shared workflows across tools). Recommendation: `.qwen/` paths; copy-pasted upstream workflows need a path adjustment.

5. **Cross-session resume**: upstream is strictly same-session only. qwen-code already ships cross-session background-agent resume (#3739, 1068 LOC). Recommendation: ship same-session in v1 to match upstream, extend to cross-session in v1.5 as a qwen-only improvement.

6. **Ultracode persistence semantics**: upstream's `ultracode: true` is session-only (does not persist across sessions). qwen-code's settings layer has no "session-only key" concept today. Recommendation: match upstream — require re-toggle per session, document it.

### Risks

1. **JS sandbox security** — `node:vm` is not a true security boundary. Mitigated by the fact that the script has no fs/shell surface by design (only spawned agents do I/O); we enforce this by not injecting `process` / `require` / `fs` / `child_process` into the context. Escalation path to `isolated-vm` if a real attacker model emerges. Workflow scripts are model-authored, not arbitrary user input.

2. **Token cost amplification** — 16× concurrency × deep nesting × shared plan billing can burn quota fast. Mitigated by agent-count caps (16 / 1000), optional `QWEN_CODE_MAX_TOKENS_PER_WORKFLOW` ceiling, one-time consent banner via `skipWorkflowUsageWarning` setting, and the keyword-trigger default flip.

3. **Subagent state leakage** — subagents share `Config` / `ToolRegistry`. Concurrent agents could leak through mutable per-call state in custom MCP servers. Mitigated by auditing `Config` for mutable per-call state and recommending `isolation: 'worktree'` for workflows that mutate files concurrently.

### Known P1 limitations (deferred to later phases)

Surfaced during PR #4732 R7 review by @DragonnZhang; documented here so they don't get re-raised in subsequent rounds:

1. **In-script async microtask leak after wall-clock timeout** — once an in-script async loop (e.g., `(async () => { while(true) await 0 })()`) starts inside the `node:vm` context, the wall-clock `Promise.race` rejects user-side but the microtask loop continues consuming host microtasks. `node:vm` provides no mechanism to halt async execution once started. Mitigated by the 30-min default cap (`QWEN_CODE_MAX_WORKFLOW_SECONDS`) and the opt-in feature gate. Proper fix: migrate the sandbox to `worker_threads` isolation in a future phase, where worker termination drops all in-flight microtasks.

2. **No memory cap on the vm context** — `node:vm` does not enforce a memory limit, so a script like `const a=[]; while(true) a.push(new ArrayBuffer(1e8))` can OOM the host process. Operator mitigation: `--max-old-space-size` flag on the parent Node process. Same proper fix as (1): `worker_threads` isolation gives the worker its own heap with a `resourceLimits.maxOldGenerationSizeMb` cap.

Both are acceptable for P1 given the opt-in gate, `ask` permission level, and 30-min wall-clock backstop — but should be addressed alongside any future phase that loosens the gating (e.g., P7 keyword trigger / ultracode session-mode would broaden the activation surface and make stricter sandbox isolation more important).

### Relation to existing features (not replaced)

- **`/swarm` (#3433)** — keep as-is. Targets the "fan out N tasks, aggregate" case where a workflow script is overkill.
- **Agent Team (#2886, in progress)** — keep as-is. Targets persistent peer-to-peer collaboration with mailboxes and shared task boards.
- **Background task framework (Issue #3634)** — `workflow` becomes the 5th `kind` consumer (`agent` / `shell` / `monitor` / `dream` / `workflow`), no framework change.

### Related upstream ports (coordinate with)

- **#4821 — Declarative agent definitions (`.qwen/agents/<name>.md` frontmatter)**. The workflow's `agent(prompt, { agentType })` option resolves `agentType` against the same registry that #4821 builds. Joint design surface:
  - `name` (#4821 frontmatter) ↔ `agentType` (workflow opts) — the same string key
  - `tools` / `disallowedTools` / `model` / `effort` / `permissionMode` from the agent definition compose with workflow's default subagent config. P1's hardcoded `disallowedTools: [SEND_MESSAGE, EXIT_PLAN_MODE]` becomes the floor that agent-level definitions extend (union), not replace
  - `isolation: 'worktree'` at the agent-definition level is the default; `opts.isolation` at the workflow call site overrides per-call
  - Workflow P1–P2 ship without #4821 (sequential / parallel / pipeline only use the built-in workflow subagent). `agentType` support lands in P3, gated on #4821's registry being available (or graceful fallback when the named type doesn't resolve)
  - Workflow and #4821 are independently shippable; their joint surface is reviewed once before P3 lands

### Upstream references

- Announcement: <https://claude.com/blog/introducing-dynamic-workflows-in-claude-code>
- Reverse-engineered against: `@anthropic-ai/claude-code-darwin-arm64@2.1.160` binary + on-disk generated workflow scripts at `~/.claude/projects/<session>/workflows/scripts/`

### Acceptance criteria

- [x] Workflow tool registered when an `enableWorkflows` setting is on; disabled by default in v1 — P1 #4732
- [x] Model can call `Workflow({ script, args })` and the script runs in a `node:vm` sandbox with the documented globals — P1 #4732
- [x] `phase()`, `agent()`, `log()` work sequentially (P1) — P1 #4732
- [x] `parallel()` / `pipeline()` run concurrently with cap enforcement and errors-as-data null semantics (P2) — P2 #4947
- [ ] `agent({ schema })` returns validated objects, retries on mismatch, returns `null` on skip (P3)
- [ ] `agent({ agentType })` resolves against the declarative-agents registry (#4821) when present; falls back to the built-in workflow subagent prompt when `agentType` is unset or doesn't resolve (P3)
- [ ] `/workflows` slash command lists active and completed runs; pill shows workflow count (P4)
- [ ] `budget.total / spent() / remaining()` reflect output-token usage; hard ceiling throws `WorkflowBudgetExceededError` (P5)
- [ ] Resume via `resumeFromRunId` returns cached prefix instantly; first edited or new `agent()` runs live (P6)
- [x] Existing `/swarm` and `Agent Team` features remain unchanged; no regression in their test suites — confirmed at each PR

---

## Update — 2026-06-12: P1 + P2 shipped, 2.1.168 reverse pass

Captures actual shipped state and the deltas found in the Claude Code 2.1.168 binary scan. The original plan, decisions, and risks above remain authoritative for the unshipped phases; this section adds (i) what actually shipped vs the original LOC/scope estimate, (ii) what changed upstream between 2.1.160 and 2.1.168, and (iii) confirmed adjacent-infrastructure reuse paths for P3–P7.

### Shipped phases

| Phase | PR | Merged | Source LOC | Total LOC (with tests) | Plan LOC | Scope verdict |
|---|---|---|---|---|---|---|
| **P1** | #4732 | 2026-06-09 | ~1207 | 3112 | ~600 | On plan, 0 missing features. Tests = 1905 LOC (61% of total). |
| **P2** | #4947 | 2026-06-12 06:16 UTC | ~541 | 1173 | ~300 | On plan, 0 missing features. Tests = 668 LOC (57% of total). |

LOC over-run vs the original ~600 / ~300 estimates is dominated by test coverage and review-round security hardening; no unplanned feature shipped. The "extras" below are all positive drift discovered during review.

**P1 positive drift (beyond original plan):**
- 30-min async wall-clock timeout (`QWEN_CODE_MAX_WORKFLOW_SECONDS`, default 1800s) — catches 0-token hangs that vm timeout and the future budget cap cannot reach (T23 R2)
- `AbortSignal` threading into `subagent.execute` so wall-clock abort propagates into in-flight subagents (T40 R4)
- `abortOnTimeout` child-controller injection seam for explicit timeout coordination (T40 R4)

**P2 positive drift (beyond original plan):**
- Hard ceilings on env-overridable caps: `HARD_MAX_AGENTS_PER_RUN_CEILING = 10000`, `HARD_MAX_CONCURRENCY_CEILING = 64` — prevents fat-finger misconfiguration uncapping a runaway workflow (R1 wenshao T4)
- Per-element vm-realm JSON revival of `parallel`/`pipeline` results instead of whole-array — closes T1/T8/T14 escape: a single non-serializable thunk result no longer wipes out sibling results (R1 self-review EAD-1)
- Dispatch-layer concurrency throttling (not thunk-layer) — prevents nested `parallel`-in-`pipeline` deadlock (the canonical `/deep-research` shape); verified with a gate-based RED test on `concurrency=1` (F1 fix)
- Observability: `debugLogger.warn` for rejected thunks (R1), `logRevivalFailure` hook for non-serializable results (R2) — disambiguates "null at index" failure modes
- Limiter prompt-queue abort listener — strengthens limiter invariant when an in-flight thunk hangs (R2)

### Shipped env vars and caps

| Env var | Default | Hard ceiling | Notes |
|---|---|---|---|
| `QWEN_CODE_ENABLE_WORKFLOWS` | unset (off) | n/a | `'1'` to enable (or `enableWorkflows: true` setting) |
| `QWEN_CODE_DISABLE_WORKFLOWS` | unset | n/a | `'1'` is a force-disable kill switch |
| `QWEN_CODE_MAX_WORKFLOW_CONCURRENCY` | `max(1, min(16, cpus-2))` | 64 | sliding window per run |
| `QWEN_CODE_MAX_WORKFLOW_AGENTS` | 1000 | 10000 | total `agent()` calls per run |
| `QWEN_CODE_MAX_WORKFLOW_SECONDS` | 1800 (30 min) | n/a | wall-clock per run |

### P3+ injection seams already pre-wired in P1/P2

- `agent({schema})`, `agent({agentType})`, `agent({isolation})`, `agent({model})` are all STUB-THROW today — P3 replaces the throw with the real implementation, no sandbox re-opening needed
- `SandboxOptions.budget` interface is wired (default `spent()`/`remaining()` throw) — P5 injects the real implementation through the existing seam
- `SandboxOptions.parallel` / `SandboxOptions.pipeline` are populated (P2)
- `resumeFromRunId` + JSONL journal: NOT pre-wired — P6 is a net-new subsystem
- Ultracode session mode + `workflowKeywordTriggerEnabled`: NOT pre-wired — P7 is net-new

### Claude Code 2.1.168 reverse pass — deltas vs 2.1.160 baseline

Binary `strings` cross-compared across 2.1.161 / 2.1.162 / 2.1.168 against the 2.1.160-documented baseline at the top of this issue.

**Unchanged:** concurrent cap `max(1, min(16, cpus-2))`, per-run agent cap `1000`, schema nudge count `2 in-conversation`, wall-clock `30 min`, single-level workflow nesting.

**New upstream features (post-2.1.160) confirmed shipped:**

| Feature | Evidence in binary | Affects qwen plan |
|---|---|---|
| `agent({schema})` enforcement | error string: `"subagent completed without calling StructuredOutput (after 2 in-conversation nudges)"` | P3 — contract locked, error msg should match |
| `agent({agentType})` | error string: `"agent({agentType}): agent type '{agentType}' not found"` | P3 — contract locked |
| `agent({isolation:'worktree'})` | strings present | P3 — match implementation |
| `agent({isolation:'remote'})` | error string: `"agent({isolation:'remote'}) is not available in this build"` | P3 — keep parity, document as not-available |
| Budget telemetry | `tengu_workflow_budget_cap_exceeded` | P5 — telemetry name aligned |
| Resume telemetry | `tengu_workflow_journal_started_hit_respawn` | P6 — pattern validated |
| Agent memory (2.1.168 new) | `Scope for auto-loading agent memory files. 'user' - ~/.claude/agent-memory/<agentType>/, 'project' - .claude/agent-memory/<agentType>/` | NEW: not in 2.1.160 baseline; candidate for a post-P7 follow-up |

**Verdict:** Workflow surface is stable across 2.1.160 → 2.1.168 for everything #4721 covers. Upstream has already shipped P3 / P5 / P6 (only `isolation:'remote'` is gated off), giving qwen a clear contract to match. Agent-memory scoping by `agentType` is new in 2.1.168 — out of #4721's original scope, candidate for a post-P7 follow-up if there's appetite.

### Adjacent infrastructure reuse confirmed on origin/main

| Phase | Existing subsystem | Reuse path | Estimated savings |
|---|---|---|---|
| **P3** schema + agentType | `SubagentManager` + `agent-frontmatter-schema.ts` (#4842 + #4996 already on main, CC 2.1.168 parity) | `await config.getSubagentManager().findSubagentByName(agentType, 'project')` returns `SubagentConfig` carrying `permissionMode`, `maxTurns`, `mcpServers`, `hooks`, `tools`, `disallowedTools`, `color`; merge into workflow's hardcoded disallow floor | ~300 LOC |
| **P3** isolation:'worktree' | qwen-code worktree subsystem (already in use elsewhere) | Wire `agent({isolation:'worktree'})` to spawn subagent in a fresh worktree; reuse cleanup helpers | (new wiring, not raw reuse) |
| **P4** UI | `BackgroundTasksPill` + `BuiltinCommandLoader` | Extend `TaskKind` union with `'workflow'`, append to `KIND_NAMES` map, register `workflowsCommand` mirroring `tasksCommand` / `hookCommand` patterns | ~200 LOC |
| **P5** budget | `AgentStatistics` per-agent token tracker | `WorkflowBudgetImpl.spent()` sums per-phase agent tokens; populate the existing P1 `SandboxOptions.budget` seam | ~100-150 LOC |
| **P6** resume | `jsonl-utils` (`writeLine`/`readLines`/`countLines`) + `FileHistoryService` serialization pattern (#4897) | New `WorkflowStateRecord` JSONL type; append on each phase completion; on resume, longest-prefix-matched replay | ~200 LOC |
| **P7** keyword trigger | `config.ts` settings layer | `workflowKeywordTriggerEnabled` mirroring `agentTeamEnabled` / `forkSubagentEnabled` patterns | ~50 LOC |

**Agent Team (#4844) integration risk = LOW**: Team and Workflow share the same `SubagentManager` registry so `agentType` semantics align, but their call sites (`team.spawn()` vs `workflow.agent()`) don't overlap. Documentation will need to disambiguate "which to use when".

### Refined phase plan (no scope drift, refined LOC estimates)

The plan above remains the authoritative scope; this is a refined estimate based on the confirmed reuse paths. **P3 ships as a single PR** covering `schema + agentType + isolation:'worktree'` rather than splitting — the model-facing API and the sandbox-execution wiring are easier to review together than as two coupled PRs.

| Phase | Refined LOC est. (src + tests) | Net-new subsystems | Notes |
|---|---|---|---|
| **P3** schema + agentType + isolation:'worktree' | ~1200-1500 | none (all reuse) | Wires existing `SubagentManager`, existing worktree subsystem, the new `StructuredOutput` contract |
| **P4** /workflows + UI + extractAndStripMeta | ~700-900 | none | Extends existing `BackgroundTasksPill` / `BuiltinCommandLoader` |
| **P5** budget | ~400-500 | none | Populates the existing `SandboxOptions.budget` seam |
| **P6** resume | ~600-800 | `WorkflowStateRecord` JSONL type | Reuses `jsonl-utils` |
| **P7** ultracode + keyword trigger | ~150-250 | none | Settings-only |

Remaining total ≈ 3000-3950 LOC across 5 PRs (average ~600/PR vs P1+P2 average ~2100/PR — easier reviews).


Phase	Existing subsystem	Reuse path	Estimated savings
P3 schema + agentType	`SubagentManager` + `agent-frontmatter-schema.ts` (#4842 + #4996 already on main, CC 2.1.168 parity)	`await config.getSubagentManager().findSubagentByName(agentType, 'project')` returns `SubagentConfig` carrying `permissionMode`, `maxTurns`, `mcpServers`, `hooks`, `tools`, `disallowedTools`, `color`; merge into workflow's hardcoded disallow floor	~300 LOC
P3 isolation:'worktree'	qwen-code worktree subsystem (already in use elsewhere)	Wire `agent({isolation:'worktree'})` to spawn subagent in a fresh worktree; reuse cleanup helpers	(new wiring, not raw reuse)
P4 UI	`BackgroundTasksPill` + `BuiltinCommandLoader`	Extend `TaskKind` union with `'workflow'`, append to `KIND_NAMES` map, register `workflowsCommand` mirroring `tasksCommand` / `hookCommand` patterns	~200 LOC
P5 budget	`AgentStatistics` per-agent token tracker	`WorkflowBudgetImpl.spent()` sums per-phase agent tokens; populate the existing P1 `SandboxOptions.budget` seam	~100-150 LOC
P6 resume	`jsonl-utils` (`writeLine`/`readLines`/`countLines`) + `FileHistoryService` serialization pattern (#4897)	New `WorkflowStateRecord` JSONL type; append on each phase completion; on resume, longest-prefix-matched replay	~200 LOC
P7 keyword trigger	`config.ts` settings layer	`workflowKeywordTriggerEnabled` mirroring `agentTeamEnabled` / `forkSubagentEnabled` patterns	~50 LOC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Port Dynamic Workflows / Ultracode from Claude Code 2.1.160 #4721

What would you like to be added?

What a dynamic workflow is

Triggers

Hard caps (verbatim from upstream)

Why is this needed?

What dynamic workflow does that swarm and Agent Team don't

Use cases this unlocks

Why the infrastructure cost is small

Additional context

Local design artifacts

Phased implementation plan

Decisions needed before P1 (qwen-specific divergences from upstream)

Risks

Known P1 limitations (deferred to later phases)

Relation to existing features (not replaced)

Related upstream ports (coordinate with)

Upstream references

Acceptance criteria

Update — 2026-06-12: P1 + P2 shipped, 2.1.168 reverse pass

Shipped phases

Shipped env vars and caps

P3+ injection seams already pre-wired in P1/P2

Claude Code 2.1.168 reverse pass — deltas vs 2.1.160 baseline

Adjacent infrastructure reuse confirmed on origin/main

Refined phase plan (no scope drift, refined LOC estimates)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Capability	`/swarm` (shipped, #3433)	Agent Team (PR #2886)	Dynamic Workflow (this proposal)
Programming model	Declarative `tasks[]` array	Imperative team / mailbox API	Imperative JS script
Multi-phase orchestration	❌ single shot	✅ via task board state	✅ via `phase()`
Pipeline (staggered, non-barrier) across stages	❌	❌	✅ via `pipeline()`
Structured-output / schema-validated agents	❌	❌	✅ via `agent({ schema })`
Resume / cached-prefix re-entry	❌	❌	✅ longest-unchanged-prefix
Programmable token budget	❌	❌	✅ `budget.total/spent/remaining`
Nested / saved workflows	❌	❌	✅ via `workflow(name, args)`
Inter-agent communication	❌	✅ peer-to-peer mailbox	❌ results return to script
Lifecycle	Ephemeral	Persistent collaboration	Ephemeral per-script

Subsystem	Existing PR / file	Reuse path
Headless subagent dispatch	#3076 #3970, `packages/core/src/agents/runtime/agent-headless.ts`	Each `agent()` → `AgentHeadless.create()`
Background-task envelope + `<task-notification>`	#3471 #3488 #3739, `packages/core/src/agents/background-tasks.ts`	Add `'workflow'` to `TaskKind`
Notification routing into main loop	#3471, `SendMessageType.Notification`	Zero change
`BackgroundTasksPill` / dialog / live panel	#3488 #3768 #4477	Extend `KIND_NAMES` with `'workflow'`
Per-source token aggregation	`UiTelemetryService.bySource`	Key by `phase:label`
Background-agent resume (1068 LOC, battle-tested)	#3739	Reusable for cross-session resume v1.5
Concurrency cap pattern	#4324, `QWEN_CODE_MAX_BACKGROUND_AGENTS`	Workflow-scoped variant
MCP propagation to subagents	shared `Config.getToolRegistry()`	Zero change
Slash command registration	`packages/cli/src/services/BuiltinCommandLoader.ts`	Append `/workflows`, `/effort`
Subagent system-prompt template	existing built-in agents	New `WORKFLOW_SUBAGENT_PROMPT` constant

Phase	Scope	Est. LOC
P1	Minimal `Workflow` tool: `node:vm` sandbox + sequential `agent()` + `phase()` + `log()`; foreground; no parallel/pipeline/schema/budget/resume	~600
P2	`parallel(thunks)` + `pipeline(items, ...stages)` + 16-concurrent / 1000-total caps + errors-as-data	~300
P3	`agent({ schema, agentType })` → forced `StructuredOutput` contract + 2-nudge in-conversation retry on schema validation failure; `agentType` resolves against the declarative-agents registry from #4821 (graceful fallback to the built-in workflow subagent if `agentType` is unset or fails to resolve)	~300
P4	Extract `meta` (`{name, description, whenToUse?, phases?[{title, detail?, model?}]}`) before stripping it from the script source (replaces P1's `stripExportMeta` with `extractAndStripMeta`); `/workflows` slash command + phase-tree progress UI + `BackgroundTasksPill` `KIND_NAMES` extension	~400
P5	`budget` global + per-phase token rollup + optional per-run token ceiling	~200
P6	Resume via longest-unchanged-prefix cache; JSONL journal under `<projectDir>/workflows/<sessionId>/`	~400
P7 (optional)	Ultracode session-mode toggle + `workflowKeywordTriggerEnabled` keyword trigger	~200

Phase	PR	Merged	Source LOC	Total LOC (with tests)	Plan LOC	Scope verdict
P1	#4732	2026-06-09	~1207	3112	~600	On plan, 0 missing features. Tests = 1905 LOC (61% of total).
P2	#4947	2026-06-12 06:16 UTC	~541	1173	~300	On plan, 0 missing features. Tests = 668 LOC (57% of total).

Env var	Default	Hard ceiling	Notes
`QWEN_CODE_ENABLE_WORKFLOWS`	unset (off)	n/a	`'1'` to enable (or `enableWorkflows: true` setting)
`QWEN_CODE_DISABLE_WORKFLOWS`	unset	n/a	`'1'` is a force-disable kill switch
`QWEN_CODE_MAX_WORKFLOW_CONCURRENCY`	`max(1, min(16, cpus-2))`	64	sliding window per run
`QWEN_CODE_MAX_WORKFLOW_AGENTS`	1000	10000	total `agent()` calls per run
`QWEN_CODE_MAX_WORKFLOW_SECONDS`	1800 (30 min)	n/a	wall-clock per run

Feature	Evidence in binary	Affects qwen plan
`agent({schema})` enforcement	error string: `"subagent completed without calling StructuredOutput (after 2 in-conversation nudges)"`	P3 — contract locked, error msg should match
`agent({agentType})`	error string: `"agent({agentType}): agent type '{agentType}' not found"`	P3 — contract locked
`agent({isolation:'worktree'})`	strings present	P3 — match implementation
`agent({isolation:'remote'})`	error string: `"agent({isolation:'remote'}) is not available in this build"`	P3 — keep parity, document as not-available
Budget telemetry	`tengu_workflow_budget_cap_exceeded`	P5 — telemetry name aligned
Resume telemetry	`tengu_workflow_journal_started_hit_respawn`	P6 — pattern validated
Agent memory (2.1.168 new)	`Scope for auto-loading agent memory files. 'user' - ~/.claude/agent-memory/<agentType>/, 'project' - .claude/agent-memory/<agentType>/`	NEW: not in 2.1.160 baseline; candidate for a post-P7 follow-up

Phase	Refined LOC est. (src + tests)	Net-new subsystems	Notes
P3 schema + agentType + isolation:'worktree'	~1200-1500	none (all reuse)	Wires existing `SubagentManager`, existing worktree subsystem, the new `StructuredOutput` contract
P4 /workflows + UI + extractAndStripMeta	~700-900	none	Extends existing `BackgroundTasksPill` / `BuiltinCommandLoader`
P5 budget	~400-500	none	Populates the existing `SandboxOptions.budget` seam
P6 resume	~600-800	`WorkflowStateRecord` JSONL type	Reuses `jsonl-utils`
P7 ultracode + keyword trigger	~150-250	none	Settings-only

Feature Request: Port Dynamic Workflows / Ultracode from Claude Code 2.1.160 #4721

Description

What would you like to be added?

What a dynamic workflow is

Triggers

Hard caps (verbatim from upstream)

Why is this needed?

What dynamic workflow does that swarm and Agent Team don't

Use cases this unlocks

Why the infrastructure cost is small

Additional context

Local design artifacts

Phased implementation plan

Decisions needed before P1 (qwen-specific divergences from upstream)

Risks

Known P1 limitations (deferred to later phases)

Relation to existing features (not replaced)

Related upstream ports (coordinate with)

Upstream references

Acceptance criteria

Update — 2026-06-12: P1 + P2 shipped, 2.1.168 reverse pass

Shipped phases

Shipped env vars and caps

P3+ injection seams already pre-wired in P1/P2

Claude Code 2.1.168 reverse pass — deltas vs 2.1.160 baseline

Adjacent infrastructure reuse confirmed on origin/main

Refined phase plan (no scope drift, refined LOC estimates)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions