feat: background subagents with headless and SDK support#3076
Conversation
Enable sub-agents to run asynchronously via `run_in_background: true` parameter. Background agents execute independently from the parent, which receives an immediate launch confirmation and continues working. A notification is injected into the parent conversation when the background agent completes. Key changes: - BackgroundTaskRegistry tracks lifecycle of background agents - Agent tool gains async execution path with fire-and-forget semantics - Background agents use YOLO approval mode to prevent deadlock - Independent AbortControllers survive parent ESC cancellation - CLI bridges notifications via useMessageQueue for between-turn delivery - State race guards prevent complete/fail after cancellation - Session cleanup aborts all running background agents
📋 Review SummaryThis PR introduces a well-designed background agent execution feature that enables sub-agents to run asynchronously with proper lifecycle management and notification delivery. The implementation demonstrates solid architectural thinking with careful attention to race conditions, cleanup, and user experience. Overall, this is a high-quality implementation that follows existing patterns in the codebase. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
- Add prefix/separator protocol to distinguish background notifications from user input - Show concise summary in UI while sending full details to LLM - Add 'notification' history item type with specialized display - Add 'background' agent status for background-running agents - Prevent notifications from polluting prompt history (up-arrow) - Truncate long descriptions in display text This improves the UX for background agents by showing cleaner, more concise notifications while preserving full context for the LLM. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Headless mode skips AppContainer, so the notification callback is never registered and background agent results would be silently dropped. Return an error prompting the model to retry without run_in_background.
…tification queue Replace the stringly-typed \x00__BG_NOTIFY__\x00 prefix/separator encoding with a typed notification path using SendMessageType.Notification. - Add SendMessageType.Notification to the enum - Change BackgroundNotificationCallback to emit (displayText, modelText) - Move notification queue from AppContainer into useGeminiStream (mirrors the cron queue pattern): register on registry, queue structured items, drain on idle via submitQuery - prepareQueryForGemini short-circuits for Notification type (skips slash commands, shell mode, @-commands, prompt history logging) - Remove BACKGROUND_NOTIFICATION_PREFIX/SEPARATOR constants
Background agent cleanup belongs in Config.shutdown() alongside other resource teardown (skillManager, toolRegistry, arenaRuntime), not in AppContainer's registerCleanup. This also ensures headless mode gets cleanup for free.
da16bf6 to
bcce78d
Compare
Background agent notifications were missing after session resume because they were never recorded in the chat history. The model text was absent from the API history and the display item was lost. - Add recordNotification() to ChatRecordingService — stores as user-role message with subtype 'notification' and displayText payload - Thread notificationDisplayText through submitQuery → sendMessageStream - Restore as HistoryItemNotification in resumeHistoryUtils
Background agents were using YOLO approval mode which auto-approves all tool calls — too permissive. Replace with shouldAvoidPermissionPrompts which auto-denies tool calls that need interactive approval, matching claw-code's approach. The permission flow for background agents is now: 1. L3/L4 permission rules (allow/deny) — same as foreground 2. Approval mode overrides (AUTO_EDIT for edits) — same as foreground 3. PermissionRequest hooks — can override the denial 4. Auto-deny — if no hook decided, deny because prompts are unavailable
…ion time Identity-shaping fork inputs (parent history, generationConfig, tool decls, env-skip flag) were threaded through `AgentHeadless.execute()`'s options bag and re-passed by the SubagentStop hook retry loop. They belong on the agent's construction-time configs, not its per-invocation options. - PromptConfig gains `renderedSystemPrompt` (verbatim, bypasses templating and userMemory injection) and drops the `systemPrompt`/`initialMessages` XOR so fork can carry both. createChat skips env bootstrap when `initialMessages` is non-empty. - AgentHeadless.execute() shrinks to (context, signal?). Fork dispatch in agent.ts builds synthetic PromptConfig/ModelConfig/ToolConfig from the parent's cache-safe params and calls AgentHeadless.create directly (bypassing SubagentManager). Parent's tool decls flow through verbatim including the `agent` tool itself for cache parity. - Recursive-fork prevention switches from fork-side tool stripping to a runtime guard. The previous `isInForkChild(history)` helper was dead code (it scanned the main GeminiClient's history, not the fork child's chat). Replaced with `isInForkExecution()` backed by AsyncLocalStorage: the fork's background execution runs inside `runInForkContext`, and the ALS frame propagates through the standard async chain into nested AgentTool.execute() calls where the guard fires.
…ectory Move agent.ts, agent.test.ts, and fork-subagent.ts under tools/agent/ and update all import paths accordingly.
These fields were never populated from subagent frontmatter and served no purpose in the fork path either. The ModelConfig interface retains only the actively-used model field.
…CacheSafeParams Fork subagent now reads system instruction and tool declarations from the live GeminiChat via getGenerationConfig() instead of the global getCacheSafeParams() snapshot. This removes the cross-module coupling between the agent tool and the followup infrastructure.
…ly inline decls prepareTools() treated asStrings.length === 0 as "add all registry tools", which is correct when no tools are specified at all, but wrong when the caller provides only inline FunctionDeclaration[] (no string names). The fork path passes parent tool declarations as inline decls for cache parity, so prepareTools was adding the full registry set on top — duplicating every non-excluded tool. Add onlyInlineDecls.length === 0 to the condition so that pure-inline toolConfigs bypass the registry entirely.
…t-construction-time # Conflicts: # packages/core/src/core/client.ts # packages/core/src/tools/agent/agent.ts
Resolve conflicts in client.ts (keep both notificationDisplayText and modelOverride fields), useGeminiStream.ts (combine both options), and agent.ts (integrate background subagent feature with fork subagent refactor and resolvedMode improvements).
…ion-time' into feat/background-subagent Resolve conflict in agent.ts: adopt #3255's runSubagentWithHooks method and fork dispatch, add background execution path before the fork/normal dispatch with its own hook firing and fire-and-forget pattern.
Subagent definitions can now declare `background: true` in their YAML frontmatter to always run as background tasks. This is OR'd with the `run_in_background` tool parameter — useful for monitors, watchers, and proactive agents so the LLM doesn't need to remember to set the flag.
- Inherit bgConfig from agentConfig so the resolved approval mode is preserved for background agents (foreground would run AUTO_EDIT but background fell back to DEFAULT, which combined with shouldAvoid- PermissionPrompts would auto-deny every permission request). - Honor SubagentStop blocking decisions in background runs by looping on hook output up to 5 iterations, matching runSubagentWithHooks. - Check terminate mode before reporting completion; non-GOAL modes (ERROR, MAX_TURNS, TIMEOUT) are now reported as failures instead of emitting a success notification for an incomplete run. - Exclude SendMessageType.Notification from the UserPromptSubmit hook guard so background completion messages are not rewritten or blocked as if they were user input.
…#3379) * feat(cli): unify notification queue for cron and background agents Migrate cron from its own queue (cronQueueRef / cronQueue) to the shared notification queue used by background agents. Both producers now push the same item shape { displayText, modelText, sendMessageType } and a single drain effect / helper processes them in FIFO order. Cron fires render as HistoryItemNotification (● prefix) instead of HistoryItemUser (> prefix), with a "Cron: <prompt>" display label. Records use subtype 'cron' for clean resume and analytics separation. Lift the non-interactive rejection for background agents. Register a notification callback in nonInteractiveCli.ts with a terminal hold-back phase (100ms poll) that keeps the process alive until all background agents complete and their notifications are processed. * feat(cli): emit SDK task events for background subagents Emit `task_started` when a background agent registers and `task_notification` when it completes, fails, or is cancelled, so headless/SDK consumers can track lifecycle without parsing display text. Model-facing text is now structured XML with status, summary, truncated result, and usage stats. Completion stats (tokens, tool uses, duration) are captured from the subagent and included in both the SDK payload and the model XML. * fix: address codex review issues for background subagents - Background subagents now inherit the resolved approval mode from agentConfig instead of the raw session config, so a subagent with `approvalMode: auto-edit` (or execution in a trusted folder) keeps that override when it runs asynchronously. - Non-interactive cron drains are single-flight: concurrent cron fires now await the same in-flight drain, and the cron-done check gates on it, preventing the final result from being emitted while a cron turn is still streaming. - Background forks go through createForkSubagent so they retain the parent's rendered system prompt and inherited history instead of degrading to a plain FORK_AGENT. * fix(cli): restore cancellation, approval, and error paths in queued drain - Hold-back loop now reacts to SIGINT/SIGTERM: when the main abort signal fires it calls registry.abortAll() so background agents with their own AbortControllers stop promptly instead of pinning the process open. - Queued-turn tool execution forwards the stream-json approval update callback (onToolCallsUpdate) so permission-gated tools inside a background-notification follow-up emit can_use_tool requests. - Queued-turn stream loop mirrors the main loop's text-mode handling of GeminiEventType.Error, writing to stderr and throwing so provider errors produce a non-zero exit code instead of silently succeeding. - Interactive cron prompts go through the normal slash/@-command/shell preprocessing again; only Notification messages skip that path. * fix(cli): skip duplicate user-message item for cron prompts Cron prompts already render as a `● Cron: …` notification via the queue drain, so adding them again as a `USER` history item produced a duplicate `> …` line. * fix(cli): honor SIGINT/SIGTERM during cron scheduler wait The non-interactive cron phase awaits a Promise that resolves only when scheduler.size reaches 0 and no drain is in flight. Recurring cron jobs never drop the scheduler size to 0 on their own, so the previous abort handling (added to the hold-back loop) was unreachable — the process hung indefinitely after SIGINT/SIGTERM. Attach an abort listener inside the promise so abort stops the scheduler and resolves immediately, allowing the hold-back loop to run and the process to exit cleanly. * feat(core): propagate tool-use id through background agent notifications Plumb the scheduler's callId into AgentToolInvocation via an optional setCallId hook on the invocation, detected structurally in buildInvocation. The agent tool forwards it as toolUseId on the BackgroundTaskRegistry entry so completion notifications can carry a <tool-use-id> tag and SDK task_started / task_notification events can emit tool_use_id — letting consumers correlate background completions back to the original Agent tool-use that spawned them. * fix(cli): drain single-flight race kept task_notification from emitting drainLocalQueue wrapped its body in an async IIFE and cleared the promise reference via finally. When the queue is empty the IIFE has no awaits, so its finally runs synchronously as part of the RHS of the assignment `drainPromise = (async () => {...})()` — clearing drainPromise BEFORE the outer assignment overwrites it with the resolved promise. The reference then stayed stuck on that fulfilled promise forever, so later calls short-circuited through `if (drainPromise) return drainPromise` and never processed queued notifications. Symptom: in headless `--output-format json` (and `stream-json`), task_started emitted but task_notification never did, even after the background agent completed. The process sat in the hold-back loop until SIGTERM. Fix: move the null-clearing out of the async body into an outer `.finally()` on the returned promise. `.finally()` runs as a microtask after the current synchronous block, so it clears the latest drainPromise reference instead of the pre-assignment null. * fix(cli): append newline to text-mode emitResult so zsh PROMPT_SP doesn't erase the line Headless text mode wrote `resultMessage.result` without a trailing newline. In a TTY, zsh themes that use PROMPT_SP (powerlevel10k, agnoster, …) detect the missing `\n` and emit `\r\033[K` before drawing the next prompt, which wipes the final line off the screen. Pipe-captured output was unaffected, so the bug only surfaced for interactive shell users — most visibly in the background-agent flow where the drain-loop's final assistant message is the *only* stdout write in text mode. Append `\n` to both the success (stdout) and error (stderr) writes. * docs(skill): tighten worked-example blurb in structured-debugging Mirror the simplified blurb from .claude/skills/structured-debugging/SKILL.md (knowledge repo). Drops the round-by-round narrative; keeps the contradiction + two lessons. * docs(skill): mirror SKILL.md improvements (reframing failure mode, generalized path, value-logging guidance) Mirror of knowledge repo commit 38eb28d into the qwen-code .qwen/skills copy. * docs(skill): mirror worked example into .qwen/skills/structured-debugging/ Mirrors knowledge/.claude/skills/structured-debugging/examples/ headless-bg-agent-empty-stdout.md so the .qwen copy of the skill links resolve. * docs(skill): mirror generalized side-note path guidance * fix(cli): harden headless cron and background-agent failure paths Three regressions surfaced by Codex review of feat/background-subagent: - Cron drain rejections were dropped by a bare `void`, so a failing queued turn left the outer Promise unresolved and hung the run. Route drain failures through the Promise's reject so they propagate to the outer catch. - The background-agent registry entry was inserted before `createForkSubagent()` / `createAgentHeadless()` was awaited. Failed init returned an error from the tool call but left a phantom `running` entry, and the headless hold-back loop (`registry.getRunning()`) waited forever. Register only after init succeeds. - SIGINT/SIGTERM during the hold-back phase aborted background tasks, then fell through to `emitResult({ isError: false })`, so a cancelled `qwen -p ...` exited 0 with the prior assistant text. Route through `handleCancellationError()` so cancellation exits non-zero, matching the main turn loop. * test(cli): update stdout/stderr assertions for trailing newline `feadf052f` appended `\n` to text-mode `emitResult` output, but the nonInteractiveCli tests still asserted the pre-change strings. Update the 11 affected assertions to expect the trailing newline. * fix: address review comments on background-agent notifications Four additional issues from the PR review that the prior regression-fix commit didn't cover: - Escape XML metacharacters when interpolating `description`, `result`, `error`, `agentId`, `toolUseId`, and `status` into the task-notification envelope. Subagent output (which itself may carry untrusted tool output, fetched HTML, or another agent's notification) could contain `</result>` or `</task-notification>` and forge sibling tags the parent model would treat as trusted metadata. Truncate result text *before* escaping so the truncation never slices through an entity like `&`. - Emit the terminal notification from `cancel()` and `abortAll()`. The fire-and-forget `complete()`/`fail()` from the subagent task is guarded by `status !== 'running'` and was no-op'd after cancellation, so SDK consumers saw `task_started` with no matching `task_notification`, breaking the contract this PR establishes. Updated two race-guard tests that asserted the old behavior. - Call `adapter.finalizeAssistantMessage()` before the abort-triggered early return inside `drainOneItem`'s stream loop. Without it, `startAssistantMessage()` had already been called, so stream-json mode left `message_start` unpaired. - Enforce `config.getMaxSessionTurns()` in `drainOneItem` for symmetry with the main turn loop. Cron fires and notification replies otherwise bypass the budget cap in headless runs.
E2E Test ReportTest plan: Results
Highlights
Test plan adjustmentJ1's initial run flagged SkippedI1 (headless mixed cron + bg agent with 120s timeout) — the CLI process was still in initial context loading when the run was halted; the headless mixed path is already exercised via F1/F2 (bg agent drain) and G1/H1 (cron unified queue) in interactive mode, so signal loss is low. |
# Conflicts: # packages/core/src/tools/agent/agent.ts
- Wrap background fork execute() in runInForkContext so the recursive-fork guard (AsyncLocalStorage-based) fires when a background fork's child model calls `agent` again. Previously only the foreground fork path was wrapped, so background forks could spawn nested implicit forks. - Emit queued terminal task_notifications on SIGINT/SIGTERM before handleCancellationError exits. abortAll() enqueues cancellation notifications via the registry callback, but the process was exiting before the drain loop had a chance to flush them — leaving stream-json consumers that already saw task_started without a matching terminal task_notification. Extracted the SDK-emit block into a shared emitNotificationToSdk helper reused by the normal drain and the cancellation flush. - Skip notification/cron subtypes in ACP HistoryReplayer. These records are persisted as type: 'user' so the model's chat history keeps them for continuity, but they were never user input — replaying them leaked raw <task-notification> XML (and cron prompts) back into the ACP session as if the user typed them.
…newline Commit 0da1182 appended a newline to text-mode emitResult output (zsh PROMPT_SP fix) and updated the nonInteractiveCli tests, but four assertions in JsonOutputAdapter.test.ts were missed. Update them to expect the trailing newline so CI passes.
- Extract the SubagentStop hook blocking-decision loop into a runSubagentStopHookLoop helper so the foreground and background paths no longer duplicate the iteration/abort/log scaffolding. - Unify BackgroundTaskRegistry.abortAll to delegate to cancel, removing copy-pasted abort/notification bookkeeping. - Drop the unused findByName and BackgroundAgentEntry.name field. - In nonInteractiveCli drain, hoist inputFormat and toolCallUpdateCallback out of the inner tool loop, and drop the unreachable try/catch around the readonly registry. - Trim boilerplate doc/narration comments while keeping load-bearing WHY comments.
- Use tool callId (or short random suffix) instead of Date.now() for background agentIds; avoids registry collisions when parallel same-type agents launch in the same millisecond. - Reset loopDetector and lastPromptId for Notification turns so a prior turn's loop count doesn't trip LoopDetected on the notification response. - Replay notification/cron displayText in ACP HistoryReplayer so the assistant reply has an antecedent in resumed transcripts.
* feat(core): managed background shell pool with /bashes command Replace shell.ts's `&` fork-and-detach background path with a managed process registry. Background shells now have observable lifecycle, captured output, and explicit cancellation — matching the pattern used by background subagents (#3076). Phase B from #3634 (background task management roadmap). What changes - New `BackgroundShellRegistry` (services/backgroundShellRegistry.ts): per-process entry with status (running / completed / failed / cancelled), AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op). Mirrors the lifecycle shape of #3471's BackgroundTaskRegistry so the two can be unified later. - `shell.ts` is_background path rewritten as `executeBackground`: - Spawns the unwrapped command (no '&', no pgrep envelope) - Streams stdout to `<projectDir>/tasks/<sessionId>/shell-<id>.output` (path layout aligns with the direction sketched in #3471 review) - Bridges the external abort signal into the entry's AbortController so a single source of truth governs cancellation - Returns immediately with id + output path; agent's turn isn't blocked - Settles the registry entry asynchronously when ShellExecutionService resolves: complete (clean exit) / fail (error) / cancel (aborted) - Removes ~120 lines of dead bg-specific code from shell.ts: pgrep wrapping, '&' appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup - New `/bashes` slash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message. What this PR doesn't do - Footer pill / dialog integration — gated on #3488 landing - task_stop / send_message integration — gated on #3471 landing - Auto-backgrounding heuristics for long foreground bash — Phase D Test plan - 11 registry unit tests (state machine + idempotent terminal transitions) - 4 background-path tests in shell.test.ts (spawn no-wrap + complete / fail / cancel settle paths) - 2 /bashes command tests (empty + populated) - Full core suite: 247 files / 6075 passed (existing tests unaffected) * fix(core): address PR #3642 review feedback Three [Critical] from the auto review + naming alignment with Claude Code: - shell.ts settle: non-zero exit code or termination signal now bucket into `failed` instead of `completed`. The previous `if (result.error) fail else complete()` would misreport `false` / failed `npm test` as success because ShellExecutionService surfaces ordinary command failures as a non-zero exitCode with `error: null`. Failure reason carries the exit code or signal so `/tasks` shows the real cause. - ShellExecutionService.childProcessFallback: add `streamStdout` mode that emits each decoded chunk through the existing onOutputEvent path. The default (foreground) path continues to buffer + emit the cleaned final blob, so existing in-line shell calls are unaffected. executeBackground opts in via `{ streamStdout: true }`, which is what makes the captured output file actually useful for long-running processes (dev servers, watchers) — without it the file stayed empty until the process exited. - shell.ts test fixture: cancel-settle test was using `signal: 'SIGTERM'` but `ShellExecutionResult.signal` is `number | null`. TS2322 broke the build; switched to `signal: null`. Added a test that explicitly covers the new "non-zero exit → failed" path so the bucketing change has regression coverage. - shell.ts comment: explicitly document why background shells force `shouldUseNodePty=false` (no terminal, no human; node-pty would be dead weight for fire-and-forget commands). - /bashes → /tasks (alias bashes), description "List and manage background tasks" — matches Claude Code's command name. Currently lists shells only; will surface other task kinds (subagents, monitor) as those registries land via #3471 / #3488. * fix(core): address PR #3642 second-round review feedback - shellExecutionService streaming: drop stdout/stderr buffer + outputChunks accumulation in streaming mode. Each decoded chunk goes straight to onOutputEvent and is GC-eligible immediately. Long-running background commands (dev servers, watchers) no longer accumulate unbounded memory proportional to total output. Buffered (foreground) mode is unchanged. - shell.ts executeBackground: stripAnsi each chunk before writing to the output file. Dev servers / build tools spam color codes and cursor-move sequences that would render as garbage in the file the agent reads. - bashesCommand: command description "List and manage" → "List background tasks" — current implementation only supports listing, cancellation follows when the unified task_stop tool from #3471 is wired in. Replace the hand-rolled formatRuntime helper with the shared formatDuration utility (uses hideTrailingZeros for parity with the previous output). - backgroundShellRegistry: add a comment documenting the lack of an eviction policy as a known limitation. LRU / age-based / capped-size eviction (and on-disk output rotation) is left as a follow-up alongside the broader output-file lifecycle story. * fix(core): address PR #3642 third-round review feedback - shell.ts executeBackground: add 'error' listener on the output write stream. fs.createWriteStream surfaces write failures (disk full, permission, fs going away) as 'error' events; without a listener Node treats it as an uncaught exception and kills the entire CLI session. Log + drop is the sane default — the registry still settles via resultPromise so /tasks shows the right terminal status. - shell.ts executeBackground: store the abort handler reference and removeEventListener in the settle callback. Background shells outlive the turn signal; the dangling listener was keeping `entryAc` (and transitively `outputStream`) reachable until the turn signal itself was GC'd, which for long sessions would never happen. - shell.test.ts: extend the createWriteStream mock with an `on` stub so the new error-listener wiring doesn't crash the test suite. * refactor(cli): drop /bashes alias and rename file to tasksCommand Per follow-up review: the slash command should be exclusively /tasks. Removes the `bashes` altName, renames `bashesCommand{,.test}.ts` → `tasksCommand{,.test}.ts`, renames the exported binding `bashesCommand` → `tasksCommand`, and cleans up the remaining `/bashes` references in backgroundShellRegistry.ts comments. No behavior change beyond the alias removal. * refactor(cli): finish tasksCommand rename — apply content changes The previous commit (03c8503) only captured the file rename via `git mv`; the export name change (`bashesCommand` → `tasksCommand`), the removal of `altNames: ['bashes']`, the import update in BuiltinCommandLoader, and the `/bashes` → `/tasks` comments in backgroundShellRegistry.ts were unstaged when that commit landed. Squash candidate before merge. * fix(core): address PR #3642 fourth-round review feedback Four reviewer concerns from @wenshao + @doudouOUC: - [Critical] Config.shutdown() now also calls `backgroundShellRegistry.abortAll()`. Previously only the subagent registry was aborted, so a managed background shell could outlive the CLI process and orphan its child. Symmetric with how `BackgroundTaskRegistry.abortAll()` is wired in. - [P1] shell.ts executeBackground strips a trailing `&` from the command before spawn. The managed path is itself the backgrounding mechanism; forwarding `node server.js &` verbatim made bash exit immediately while the real child outlived the wrapper, causing the registry to settle as `completed` while the shell was still running and chunked output to land on a closed stream. Strip + warn. - [P2] Output file moves under `storage.getProjectTempDir()` (specifically `<projectTempDir>/background-shells/<sessionId>/shell-<id>.output`). `ReadFileTool` already auto-allows the project temp dir, so the LLM can `Read` the captured output without bouncing off a permission prompt — important because background-agent contexts can't surface interactive prompts. - [P2] Background shells are no longer killed when the current turn's AbortSignal fires. Forwarding the turn signal into the entry's AbortController meant a Ctrl+C on the turn would also terminate intentionally backgrounded dev servers / watchers, contradicting the independent-lifecycle promise. Cancellation now flows only through `entryAc` (driven by future `task_stop` integration via #3471). Tests: - New `abortAll` registry tests cover running / mixed / empty cases. - `runs background commands as managed pool entries` test stops asserting the wrapper-vs-entry signal identity since they're now structurally separate (no turn-to-entry forwarding). - New `does not forward the turn signal into the background shell` test pins the new behavior. - New `strips trailing & from the spawned command` test pins the strip. - Removed the cancel-via-outer-signal settle test — that path no longer exists; cancellation is exercised end-to-end via the registry's own `cancel` and `abortAll` tests in `backgroundShellRegistry.test.ts`. * fix(core): tighten trailing & strip — narrow regex + ReDoS-safe Two reviewer concerns on the same line of #3642 round 4: - [Critical CodeQL] `\s*&+\s*$` is a polynomial-time regex on uncontrolled input (long all-`&` strings backtrack quadratically). - [P2 doudouOUC] `&+` is too greedy: it also rewrites `npm run dev &&` into `npm run dev` (breaks logical AND syntax) and `echo foo \&` into `echo foo \` (eats the escaped literal). Only the bare bash background operator should be stripped. Replace the regex with a small linear-time helper `stripTrailingBackgroundAmp` that explicitly checks for the three "don't touch" cases (`&&`, `\&`, no trailing `&`). Plain `endsWith` / `slice` — no regex backtracking, and the intent reads off the page. Tests: - Existing strip-trailing-`&` test still passes. - New `does not strip a trailing &&` test pins the logical-AND case. - New `does not strip an escaped trailing \\&` test pins the escape case. * fix(core): keep binary-detection sniff in streaming mode @doudouOUC noted that `streamStdout` shortcut returned before the binary-sniff path, so a background command emitting binary bytes (`cat /bin/ls`, image dump, etc.) would be text-decoded and appended to the task output file unbounded. Restructure handleOutput so the sniff-and-cutover logic runs in both modes: - Both modes accumulate up to MAX_SNIFF_SIZE for the binary check. The accumulator is bounded; once the threshold is reached, it stops growing in streaming mode (dropped on binary detection / left inert on text confirmation) and continues to accumulate in buffered mode (existing foreground behavior). - Streaming mode emits 'binary_detected' as soon as `isBinary` trips so the consumer can stop writing the output file. Up to ~4KB of bytes may have been emitted as text chunks before detection — this is bounded and acceptable; the unbounded write is the pathology reviewers flagged. - Streaming text mode still emits each decoded chunk immediately and does not accumulate stdout/stderr strings, so long-running text streams remain GC-friendly. - Buffered (foreground) behavior is unchanged — the sniff accumulator is the same path the existing tests cover. Tests: 50 shellExecutionService + 11 backgroundShellRegistry + 57 shell.test.ts all pass; no regressions. * fix(core): tighten streaming sniff bound + Windows rmSync flake Two unrelated reds on the latest CI run: 1. [P1 doudouOUC] Streaming sniff buffer leaks on small chunks. The previous fix recomputed `sniffedBytes` from `Buffer.concat(outputChunks.slice(0, 20)).length` on every chunk — pinned to the first 20 chunks. If those total under MAX_SNIFF_SIZE (line-sized stdout, e.g. dev-server logs) the byte count never grew, the sniff branch stayed open forever, and `outputChunks` accumulated every later chunk — exactly the leak `streamStdout` was meant to prevent. Track sniffed bytes by running sum (`sniffedBytes += data.length`) so the bound is genuine. When sniff confirms text in streaming mode, drop the accumulator immediately so subsequent chunks fall through the streaming emit path without ever touching it. 2. file-exporters.test.ts afterEach `fs.rmSync` flaked on Windows (ENOTEMPTY: directory not empty). The exporter's underlying write stream hasn't always released its handle by the time `rmSync` runs. Pass `maxRetries: 5, retryDelay: 50` so the cleanup retries through the brief Windows handle-release window instead of failing the test on a CI quirk. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
Captures current state of the bg-agent subsystem in the fork (what's already wired, what is not), maps the upstream qwen-code PRs we have not yet ported (QwenLM#3076 → QwenLM#3739), and sketches three phases to close the gap: - Phase A: model-facing agent control + event monitor (QwenLM#3471, QwenLM#3684, QwenLM#3687) - Phase B: TUI surface + /tasks command (QwenLM#3488, QwenLM#3642) - Phase C: cross-session resume (QwenLM#3739) Also calls out cross-cutting decisions we should make before Phase A lands: settings layout, stop-tool naming, persistence shape, gateway validation. This is a planning doc, not a spec. Per-phase code-level designs come later. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(core): add run_in_background support for Agent tool Enable sub-agents to run asynchronously via `run_in_background: true` parameter. Background agents execute independently from the parent, which receives an immediate launch confirmation and continues working. A notification is injected into the parent conversation when the background agent completes. Key changes: - BackgroundTaskRegistry tracks lifecycle of background agents - Agent tool gains async execution path with fire-and-forget semantics - Background agents use YOLO approval mode to prevent deadlock - Independent AbortControllers survive parent ESC cancellation - CLI bridges notifications via useMessageQueue for between-turn delivery - State race guards prevent complete/fail after cancellation - Session cleanup aborts all running background agents * feat(background): improve notification formatting and UI handling - Add prefix/separator protocol to distinguish background notifications from user input - Show concise summary in UI while sending full details to LLM - Add 'notification' history item type with specialized display - Add 'background' agent status for background-running agents - Prevent notifications from polluting prompt history (up-arrow) - Truncate long descriptions in display text This improves the UX for background agents by showing cleaner, more concise notifications while preserving full context for the LLM. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(background): reject run_in_background in non-interactive mode Headless mode skips AppContainer, so the notification callback is never registered and background agent results would be silently dropped. Return an error prompting the model to retry without run_in_background. * refactor(background): replace prefix/separator protocol with typed notification queue Replace the stringly-typed \x00__BG_NOTIFY__\x00 prefix/separator encoding with a typed notification path using SendMessageType.Notification. - Add SendMessageType.Notification to the enum - Change BackgroundNotificationCallback to emit (displayText, modelText) - Move notification queue from AppContainer into useGeminiStream (mirrors the cron queue pattern): register on registry, queue structured items, drain on idle via submitQuery - prepareQueryForGemini short-circuits for Notification type (skips slash commands, shell mode, @-commands, prompt history logging) - Remove BACKGROUND_NOTIFICATION_PREFIX/SEPARATOR constants * refactor(background): move abortAll to Config.shutdown Background agent cleanup belongs in Config.shutdown() alongside other resource teardown (skillManager, toolRegistry, arenaRuntime), not in AppContainer's registerCleanup. This also ensures headless mode gets cleanup for free. * fix(background): persist notification items for session resume Background agent notifications were missing after session resume because they were never recorded in the chat history. The model text was absent from the API history and the display item was lost. - Add recordNotification() to ChatRecordingService — stores as user-role message with subtype 'notification' and displayText payload - Thread notificationDisplayText through submitQuery → sendMessageStream - Restore as HistoryItemNotification in resumeHistoryUtils * fix(background): replace YOLO with deny-by-default for background agents Background agents were using YOLO approval mode which auto-approves all tool calls — too permissive. Replace with shouldAvoidPermissionPrompts which auto-denies tool calls that need interactive approval, matching claw-code's approach. The permission flow for background agents is now: 1. L3/L4 permission rules (allow/deny) — same as foreground 2. Approval mode overrides (AUTO_EDIT for edits) — same as foreground 3. PermissionRequest hooks — can override the denial 4. Auto-deny — if no hook decided, deny because prompts are unavailable * fix(background): add missing getBackgroundTaskRegistry mock in useGeminiStream tests * refactor(core): move fork subagent params from execute() to construction time Identity-shaping fork inputs (parent history, generationConfig, tool decls, env-skip flag) were threaded through `AgentHeadless.execute()`'s options bag and re-passed by the SubagentStop hook retry loop. They belong on the agent's construction-time configs, not its per-invocation options. - PromptConfig gains `renderedSystemPrompt` (verbatim, bypasses templating and userMemory injection) and drops the `systemPrompt`/`initialMessages` XOR so fork can carry both. createChat skips env bootstrap when `initialMessages` is non-empty. - AgentHeadless.execute() shrinks to (context, signal?). Fork dispatch in agent.ts builds synthetic PromptConfig/ModelConfig/ToolConfig from the parent's cache-safe params and calls AgentHeadless.create directly (bypassing SubagentManager). Parent's tool decls flow through verbatim including the `agent` tool itself for cache parity. - Recursive-fork prevention switches from fork-side tool stripping to a runtime guard. The previous `isInForkChild(history)` helper was dead code (it scanned the main GeminiClient's history, not the fork child's chat). Replaced with `isInForkExecution()` backed by AsyncLocalStorage: the fork's background execution runs inside `runInForkContext`, and the ALS frame propagates through the standard async chain into nested AgentTool.execute() calls where the guard fires. * refactor(core): move agent tool files into dedicated tools/agent/ directory Move agent.ts, agent.test.ts, and fork-subagent.ts under tools/agent/ and update all import paths accordingly. * refactor(core): remove dead temp and top_p fields from ModelConfig These fields were never populated from subagent frontmatter and served no purpose in the fork path either. The ModelConfig interface retains only the actively-used model field. * refactor(core): read parent generation config directly instead of getCacheSafeParams Fork subagent now reads system instruction and tool declarations from the live GeminiChat via getGenerationConfig() instead of the global getCacheSafeParams() snapshot. This removes the cross-module coupling between the agent tool and the followup infrastructure. * fix(core): prevent duplicate tool declarations when toolConfig has only inline decls prepareTools() treated asStrings.length === 0 as "add all registry tools", which is correct when no tools are specified at all, but wrong when the caller provides only inline FunctionDeclaration[] (no string names). The fork path passes parent tool declarations as inline decls for cache parity, so prepareTools was adding the full registry set on top — duplicating every non-excluded tool. Add onlyInlineDecls.length === 0 to the condition so that pure-inline toolConfigs bypass the registry entirely. * feat(core): support agent-level `background: true` in frontmatter Subagent definitions can now declare `background: true` in their YAML frontmatter to always run as background tasks. This is OR'd with the `run_in_background` tool parameter — useful for monitors, watchers, and proactive agents so the LLM doesn't need to remember to set the flag. * fix(core): address background subagent lifecycle gaps - Inherit bgConfig from agentConfig so the resolved approval mode is preserved for background agents (foreground would run AUTO_EDIT but background fell back to DEFAULT, which combined with shouldAvoid- PermissionPrompts would auto-deny every permission request). - Honor SubagentStop blocking decisions in background runs by looping on hook output up to 5 iterations, matching runSubagentWithHooks. - Check terminate mode before reporting completion; non-GOAL modes (ERROR, MAX_TURNS, TIMEOUT) are now reported as failures instead of emitting a success notification for an incomplete run. - Exclude SendMessageType.Notification from the UserPromptSubmit hook guard so background completion messages are not rewritten or blocked as if they were user input. * feat(cli): headless support and SDK task events for background agents (QwenLM#3379) * feat(cli): unify notification queue for cron and background agents Migrate cron from its own queue (cronQueueRef / cronQueue) to the shared notification queue used by background agents. Both producers now push the same item shape { displayText, modelText, sendMessageType } and a single drain effect / helper processes them in FIFO order. Cron fires render as HistoryItemNotification (● prefix) instead of HistoryItemUser (> prefix), with a "Cron: <prompt>" display label. Records use subtype 'cron' for clean resume and analytics separation. Lift the non-interactive rejection for background agents. Register a notification callback in nonInteractiveCli.ts with a terminal hold-back phase (100ms poll) that keeps the process alive until all background agents complete and their notifications are processed. * feat(cli): emit SDK task events for background subagents Emit `task_started` when a background agent registers and `task_notification` when it completes, fails, or is cancelled, so headless/SDK consumers can track lifecycle without parsing display text. Model-facing text is now structured XML with status, summary, truncated result, and usage stats. Completion stats (tokens, tool uses, duration) are captured from the subagent and included in both the SDK payload and the model XML. * fix: address codex review issues for background subagents - Background subagents now inherit the resolved approval mode from agentConfig instead of the raw session config, so a subagent with `approvalMode: auto-edit` (or execution in a trusted folder) keeps that override when it runs asynchronously. - Non-interactive cron drains are single-flight: concurrent cron fires now await the same in-flight drain, and the cron-done check gates on it, preventing the final result from being emitted while a cron turn is still streaming. - Background forks go through createForkSubagent so they retain the parent's rendered system prompt and inherited history instead of degrading to a plain FORK_AGENT. * fix(cli): restore cancellation, approval, and error paths in queued drain - Hold-back loop now reacts to SIGINT/SIGTERM: when the main abort signal fires it calls registry.abortAll() so background agents with their own AbortControllers stop promptly instead of pinning the process open. - Queued-turn tool execution forwards the stream-json approval update callback (onToolCallsUpdate) so permission-gated tools inside a background-notification follow-up emit can_use_tool requests. - Queued-turn stream loop mirrors the main loop's text-mode handling of GeminiEventType.Error, writing to stderr and throwing so provider errors produce a non-zero exit code instead of silently succeeding. - Interactive cron prompts go through the normal slash/@-command/shell preprocessing again; only Notification messages skip that path. * fix(cli): skip duplicate user-message item for cron prompts Cron prompts already render as a `● Cron: …` notification via the queue drain, so adding them again as a `USER` history item produced a duplicate `> …` line. * fix(cli): honor SIGINT/SIGTERM during cron scheduler wait The non-interactive cron phase awaits a Promise that resolves only when scheduler.size reaches 0 and no drain is in flight. Recurring cron jobs never drop the scheduler size to 0 on their own, so the previous abort handling (added to the hold-back loop) was unreachable — the process hung indefinitely after SIGINT/SIGTERM. Attach an abort listener inside the promise so abort stops the scheduler and resolves immediately, allowing the hold-back loop to run and the process to exit cleanly. * feat(core): propagate tool-use id through background agent notifications Plumb the scheduler's callId into AgentToolInvocation via an optional setCallId hook on the invocation, detected structurally in buildInvocation. The agent tool forwards it as toolUseId on the BackgroundTaskRegistry entry so completion notifications can carry a <tool-use-id> tag and SDK task_started / task_notification events can emit tool_use_id — letting consumers correlate background completions back to the original Agent tool-use that spawned them. * fix(cli): drain single-flight race kept task_notification from emitting drainLocalQueue wrapped its body in an async IIFE and cleared the promise reference via finally. When the queue is empty the IIFE has no awaits, so its finally runs synchronously as part of the RHS of the assignment `drainPromise = (async () => {...})()` — clearing drainPromise BEFORE the outer assignment overwrites it with the resolved promise. The reference then stayed stuck on that fulfilled promise forever, so later calls short-circuited through `if (drainPromise) return drainPromise` and never processed queued notifications. Symptom: in headless `--output-format json` (and `stream-json`), task_started emitted but task_notification never did, even after the background agent completed. The process sat in the hold-back loop until SIGTERM. Fix: move the null-clearing out of the async body into an outer `.finally()` on the returned promise. `.finally()` runs as a microtask after the current synchronous block, so it clears the latest drainPromise reference instead of the pre-assignment null. * fix(cli): append newline to text-mode emitResult so zsh PROMPT_SP doesn't erase the line Headless text mode wrote `resultMessage.result` without a trailing newline. In a TTY, zsh themes that use PROMPT_SP (powerlevel10k, agnoster, …) detect the missing `\n` and emit `\r\033[K` before drawing the next prompt, which wipes the final line off the screen. Pipe-captured output was unaffected, so the bug only surfaced for interactive shell users — most visibly in the background-agent flow where the drain-loop's final assistant message is the *only* stdout write in text mode. Append `\n` to both the success (stdout) and error (stderr) writes. * docs(skill): tighten worked-example blurb in structured-debugging Mirror the simplified blurb from .claude/skills/structured-debugging/SKILL.md (knowledge repo). Drops the round-by-round narrative; keeps the contradiction + two lessons. * docs(skill): mirror SKILL.md improvements (reframing failure mode, generalized path, value-logging guidance) Mirror of knowledge repo commit 38eb28d into the qwen-code .qwen/skills copy. * docs(skill): mirror worked example into .qwen/skills/structured-debugging/ Mirrors knowledge/.claude/skills/structured-debugging/examples/ headless-bg-agent-empty-stdout.md so the .qwen copy of the skill links resolve. * docs(skill): mirror generalized side-note path guidance * fix(cli): harden headless cron and background-agent failure paths Three regressions surfaced by Codex review of feat/background-subagent: - Cron drain rejections were dropped by a bare `void`, so a failing queued turn left the outer Promise unresolved and hung the run. Route drain failures through the Promise's reject so they propagate to the outer catch. - The background-agent registry entry was inserted before `createForkSubagent()` / `createAgentHeadless()` was awaited. Failed init returned an error from the tool call but left a phantom `running` entry, and the headless hold-back loop (`registry.getRunning()`) waited forever. Register only after init succeeds. - SIGINT/SIGTERM during the hold-back phase aborted background tasks, then fell through to `emitResult({ isError: false })`, so a cancelled `qwen -p ...` exited 0 with the prior assistant text. Route through `handleCancellationError()` so cancellation exits non-zero, matching the main turn loop. * test(cli): update stdout/stderr assertions for trailing newline `40c16aeb4` appended `\n` to text-mode `emitResult` output, but the nonInteractiveCli tests still asserted the pre-change strings. Update the 11 affected assertions to expect the trailing newline. * fix: address review comments on background-agent notifications Four additional issues from the PR review that the prior regression-fix commit didn't cover: - Escape XML metacharacters when interpolating `description`, `result`, `error`, `agentId`, `toolUseId`, and `status` into the task-notification envelope. Subagent output (which itself may carry untrusted tool output, fetched HTML, or another agent's notification) could contain `</result>` or `</task-notification>` and forge sibling tags the parent model would treat as trusted metadata. Truncate result text *before* escaping so the truncation never slices through an entity like `&`. - Emit the terminal notification from `cancel()` and `abortAll()`. The fire-and-forget `complete()`/`fail()` from the subagent task is guarded by `status !== 'running'` and was no-op'd after cancellation, so SDK consumers saw `task_started` with no matching `task_notification`, breaking the contract this PR establishes. Updated two race-guard tests that asserted the old behavior. - Call `adapter.finalizeAssistantMessage()` before the abort-triggered early return inside `drainOneItem`'s stream loop. Without it, `startAssistantMessage()` had already been called, so stream-json mode left `message_start` unpaired. - Enforce `config.getMaxSessionTurns()` in `drainOneItem` for symmetry with the main turn loop. Cron fires and notification replies otherwise bypass the budget cap in headless runs. * fix: address codex review comments for background subagents - Wrap background fork execute() in runInForkContext so the recursive-fork guard (AsyncLocalStorage-based) fires when a background fork's child model calls `agent` again. Previously only the foreground fork path was wrapped, so background forks could spawn nested implicit forks. - Emit queued terminal task_notifications on SIGINT/SIGTERM before handleCancellationError exits. abortAll() enqueues cancellation notifications via the registry callback, but the process was exiting before the drain loop had a chance to flush them — leaving stream-json consumers that already saw task_started without a matching terminal task_notification. Extracted the SDK-emit block into a shared emitNotificationToSdk helper reused by the normal drain and the cancellation flush. - Skip notification/cron subtypes in ACP HistoryReplayer. These records are persisted as type: 'user' so the model's chat history keeps them for continuity, but they were never user input — replaying them leaked raw <task-notification> XML (and cron prompts) back into the ACP session as if the user typed them. * test(cli): sync JsonOutputAdapter text-mode assertions with trailing newline Commit 11e6505eb appended a newline to text-mode emitResult output (zsh PROMPT_SP fix) and updated the nonInteractiveCli tests, but four assertions in JsonOutputAdapter.test.ts were missed. Update them to expect the trailing newline so CI passes. * refactor: simplify background subagent plumbing - Extract the SubagentStop hook blocking-decision loop into a runSubagentStopHookLoop helper so the foreground and background paths no longer duplicate the iteration/abort/log scaffolding. - Unify BackgroundTaskRegistry.abortAll to delegate to cancel, removing copy-pasted abort/notification bookkeeping. - Drop the unused findByName and BackgroundAgentEntry.name field. - In nonInteractiveCli drain, hoist inputFormat and toolCallUpdateCallback out of the inner tool loop, and drop the unreachable try/catch around the readonly registry. - Trim boilerplate doc/narration comments while keeping load-bearing WHY comments. * fix: address codex review comments for background subagents - Use tool callId (or short random suffix) instead of Date.now() for background agentIds; avoids registry collisions when parallel same-type agents launch in the same millisecond. - Reset loopDetector and lastPromptId for Notification turns so a prior turn's loop count doesn't trip LoopDetected on the notification response. - Replay notification/cron displayText in ACP HistoryReplayer so the assistant reply has an antecedent in resumed transcripts. --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…#3642) * feat(core): managed background shell pool with /bashes command Replace shell.ts's `&` fork-and-detach background path with a managed process registry. Background shells now have observable lifecycle, captured output, and explicit cancellation — matching the pattern used by background subagents (QwenLM#3076). Phase B from QwenLM#3634 (background task management roadmap). What changes - New `BackgroundShellRegistry` (services/backgroundShellRegistry.ts): per-process entry with status (running / completed / failed / cancelled), AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op). Mirrors the lifecycle shape of QwenLM#3471's BackgroundTaskRegistry so the two can be unified later. - `shell.ts` is_background path rewritten as `executeBackground`: - Spawns the unwrapped command (no '&', no pgrep envelope) - Streams stdout to `<projectDir>/tasks/<sessionId>/shell-<id>.output` (path layout aligns with the direction sketched in QwenLM#3471 review) - Bridges the external abort signal into the entry's AbortController so a single source of truth governs cancellation - Returns immediately with id + output path; agent's turn isn't blocked - Settles the registry entry asynchronously when ShellExecutionService resolves: complete (clean exit) / fail (error) / cancel (aborted) - Removes ~120 lines of dead bg-specific code from shell.ts: pgrep wrapping, '&' appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup - New `/bashes` slash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message. What this PR doesn't do - Footer pill / dialog integration — gated on QwenLM#3488 landing - task_stop / send_message integration — gated on QwenLM#3471 landing - Auto-backgrounding heuristics for long foreground bash — Phase D Test plan - 11 registry unit tests (state machine + idempotent terminal transitions) - 4 background-path tests in shell.test.ts (spawn no-wrap + complete / fail / cancel settle paths) - 2 /bashes command tests (empty + populated) - Full core suite: 247 files / 6075 passed (existing tests unaffected) * fix(core): address PR QwenLM#3642 review feedback Three [Critical] from the auto review + naming alignment with Claude Code: - shell.ts settle: non-zero exit code or termination signal now bucket into `failed` instead of `completed`. The previous `if (result.error) fail else complete()` would misreport `false` / failed `npm test` as success because ShellExecutionService surfaces ordinary command failures as a non-zero exitCode with `error: null`. Failure reason carries the exit code or signal so `/tasks` shows the real cause. - ShellExecutionService.childProcessFallback: add `streamStdout` mode that emits each decoded chunk through the existing onOutputEvent path. The default (foreground) path continues to buffer + emit the cleaned final blob, so existing in-line shell calls are unaffected. executeBackground opts in via `{ streamStdout: true }`, which is what makes the captured output file actually useful for long-running processes (dev servers, watchers) — without it the file stayed empty until the process exited. - shell.ts test fixture: cancel-settle test was using `signal: 'SIGTERM'` but `ShellExecutionResult.signal` is `number | null`. TS2322 broke the build; switched to `signal: null`. Added a test that explicitly covers the new "non-zero exit → failed" path so the bucketing change has regression coverage. - shell.ts comment: explicitly document why background shells force `shouldUseNodePty=false` (no terminal, no human; node-pty would be dead weight for fire-and-forget commands). - /bashes → /tasks (alias bashes), description "List and manage background tasks" — matches Claude Code's command name. Currently lists shells only; will surface other task kinds (subagents, monitor) as those registries land via QwenLM#3471 / QwenLM#3488. * fix(core): address PR QwenLM#3642 second-round review feedback - shellExecutionService streaming: drop stdout/stderr buffer + outputChunks accumulation in streaming mode. Each decoded chunk goes straight to onOutputEvent and is GC-eligible immediately. Long-running background commands (dev servers, watchers) no longer accumulate unbounded memory proportional to total output. Buffered (foreground) mode is unchanged. - shell.ts executeBackground: stripAnsi each chunk before writing to the output file. Dev servers / build tools spam color codes and cursor-move sequences that would render as garbage in the file the agent reads. - bashesCommand: command description "List and manage" → "List background tasks" — current implementation only supports listing, cancellation follows when the unified task_stop tool from QwenLM#3471 is wired in. Replace the hand-rolled formatRuntime helper with the shared formatDuration utility (uses hideTrailingZeros for parity with the previous output). - backgroundShellRegistry: add a comment documenting the lack of an eviction policy as a known limitation. LRU / age-based / capped-size eviction (and on-disk output rotation) is left as a follow-up alongside the broader output-file lifecycle story. * fix(core): address PR QwenLM#3642 third-round review feedback - shell.ts executeBackground: add 'error' listener on the output write stream. fs.createWriteStream surfaces write failures (disk full, permission, fs going away) as 'error' events; without a listener Node treats it as an uncaught exception and kills the entire CLI session. Log + drop is the sane default — the registry still settles via resultPromise so /tasks shows the right terminal status. - shell.ts executeBackground: store the abort handler reference and removeEventListener in the settle callback. Background shells outlive the turn signal; the dangling listener was keeping `entryAc` (and transitively `outputStream`) reachable until the turn signal itself was GC'd, which for long sessions would never happen. - shell.test.ts: extend the createWriteStream mock with an `on` stub so the new error-listener wiring doesn't crash the test suite. * refactor(cli): drop /bashes alias and rename file to tasksCommand Per follow-up review: the slash command should be exclusively /tasks. Removes the `bashes` altName, renames `bashesCommand{,.test}.ts` → `tasksCommand{,.test}.ts`, renames the exported binding `bashesCommand` → `tasksCommand`, and cleans up the remaining `/bashes` references in backgroundShellRegistry.ts comments. No behavior change beyond the alias removal. * refactor(cli): finish tasksCommand rename — apply content changes The previous commit (7b8b73b75) only captured the file rename via `git mv`; the export name change (`bashesCommand` → `tasksCommand`), the removal of `altNames: ['bashes']`, the import update in BuiltinCommandLoader, and the `/bashes` → `/tasks` comments in backgroundShellRegistry.ts were unstaged when that commit landed. Squash candidate before merge. * fix(core): address PR QwenLM#3642 fourth-round review feedback Four reviewer concerns from @wenshao + @doudouOUC: - [Critical] Config.shutdown() now also calls `backgroundShellRegistry.abortAll()`. Previously only the subagent registry was aborted, so a managed background shell could outlive the CLI process and orphan its child. Symmetric with how `BackgroundTaskRegistry.abortAll()` is wired in. - [P1] shell.ts executeBackground strips a trailing `&` from the command before spawn. The managed path is itself the backgrounding mechanism; forwarding `node server.js &` verbatim made bash exit immediately while the real child outlived the wrapper, causing the registry to settle as `completed` while the shell was still running and chunked output to land on a closed stream. Strip + warn. - [P2] Output file moves under `storage.getProjectTempDir()` (specifically `<projectTempDir>/background-shells/<sessionId>/shell-<id>.output`). `ReadFileTool` already auto-allows the project temp dir, so the LLM can `Read` the captured output without bouncing off a permission prompt — important because background-agent contexts can't surface interactive prompts. - [P2] Background shells are no longer killed when the current turn's AbortSignal fires. Forwarding the turn signal into the entry's AbortController meant a Ctrl+C on the turn would also terminate intentionally backgrounded dev servers / watchers, contradicting the independent-lifecycle promise. Cancellation now flows only through `entryAc` (driven by future `task_stop` integration via QwenLM#3471). Tests: - New `abortAll` registry tests cover running / mixed / empty cases. - `runs background commands as managed pool entries` test stops asserting the wrapper-vs-entry signal identity since they're now structurally separate (no turn-to-entry forwarding). - New `does not forward the turn signal into the background shell` test pins the new behavior. - New `strips trailing & from the spawned command` test pins the strip. - Removed the cancel-via-outer-signal settle test — that path no longer exists; cancellation is exercised end-to-end via the registry's own `cancel` and `abortAll` tests in `backgroundShellRegistry.test.ts`. * fix(core): tighten trailing & strip — narrow regex + ReDoS-safe Two reviewer concerns on the same line of QwenLM#3642 round 4: - [Critical CodeQL] `\s*&+\s*$` is a polynomial-time regex on uncontrolled input (long all-`&` strings backtrack quadratically). - [P2 doudouOUC] `&+` is too greedy: it also rewrites `npm run dev &&` into `npm run dev` (breaks logical AND syntax) and `echo foo \&` into `echo foo \` (eats the escaped literal). Only the bare bash background operator should be stripped. Replace the regex with a small linear-time helper `stripTrailingBackgroundAmp` that explicitly checks for the three "don't touch" cases (`&&`, `\&`, no trailing `&`). Plain `endsWith` / `slice` — no regex backtracking, and the intent reads off the page. Tests: - Existing strip-trailing-`&` test still passes. - New `does not strip a trailing &&` test pins the logical-AND case. - New `does not strip an escaped trailing \\&` test pins the escape case. * fix(core): keep binary-detection sniff in streaming mode @doudouOUC noted that `streamStdout` shortcut returned before the binary-sniff path, so a background command emitting binary bytes (`cat /bin/ls`, image dump, etc.) would be text-decoded and appended to the task output file unbounded. Restructure handleOutput so the sniff-and-cutover logic runs in both modes: - Both modes accumulate up to MAX_SNIFF_SIZE for the binary check. The accumulator is bounded; once the threshold is reached, it stops growing in streaming mode (dropped on binary detection / left inert on text confirmation) and continues to accumulate in buffered mode (existing foreground behavior). - Streaming mode emits 'binary_detected' as soon as `isBinary` trips so the consumer can stop writing the output file. Up to ~4KB of bytes may have been emitted as text chunks before detection — this is bounded and acceptable; the unbounded write is the pathology reviewers flagged. - Streaming text mode still emits each decoded chunk immediately and does not accumulate stdout/stderr strings, so long-running text streams remain GC-friendly. - Buffered (foreground) behavior is unchanged — the sniff accumulator is the same path the existing tests cover. Tests: 50 shellExecutionService + 11 backgroundShellRegistry + 57 shell.test.ts all pass; no regressions. * fix(core): tighten streaming sniff bound + Windows rmSync flake Two unrelated reds on the latest CI run: 1. [P1 doudouOUC] Streaming sniff buffer leaks on small chunks. The previous fix recomputed `sniffedBytes` from `Buffer.concat(outputChunks.slice(0, 20)).length` on every chunk — pinned to the first 20 chunks. If those total under MAX_SNIFF_SIZE (line-sized stdout, e.g. dev-server logs) the byte count never grew, the sniff branch stayed open forever, and `outputChunks` accumulated every later chunk — exactly the leak `streamStdout` was meant to prevent. Track sniffed bytes by running sum (`sniffedBytes += data.length`) so the bound is genuine. When sniff confirms text in streaming mode, drop the accumulator immediately so subsequent chunks fall through the streaming emit path without ever touching it. 2. file-exporters.test.ts afterEach `fs.rmSync` flaked on Windows (ENOTEMPTY: directory not empty). The exporter's underlying write stream hasn't always released its handle by the time `rmSync` runs. Pass `maxRetries: 5, retryDelay: 50` so the cleanup retries through the brief Windows handle-release window instead of failing the test on a CI quirk. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
TLDR
Adds background subagents: an Agent tool call with
run_in_background: true(or a subagent declaredbackground: truein frontmatter) launches asynchronously, returns immediately, and delivers its result to the parent as a notification when it finishes. This works uniformly across interactive, headless, and SDK consumers, with structured lifecycle events on the stream-json output so programmatic callers can track agents without parsing display text.Dive Deeper
Lifecycle
A background agent is registered with
BackgroundTaskRegistrythe moment its subagent finishes construction. It runs on anAbortControllerindependent from the parent turn, so ESC on the parent cancels the current turn only; background work continues. Registry teardown lives inConfig.shutdown()alongside other resource cleanup, which means headless runs get teardown for free.Completion, failure, and cancellation all pass through the same terminal path: register a notification on the shared queue, emit an SDK
task_notificationevent, stop.complete()/fail()are guarded so that a cancel during a live run doesn't produce a spurious success notification, andcancel()/abortAll()still emit a terminal notification so SDK consumers always see atask_notificationfor everytask_startedthey observed.Notification delivery
Notifications are typed (
SendMessageType.Notification) and queued rather than stringly encoded. They drain between model turns through a single-flight helper shared with cron: both producers push the same item shape onto the same queue, and one drain loop processes them in FIFO order. The display item shows a concise● ...line; the model receives a structured XML envelope carrying status, truncated result, usage stats, and (when available) the originating tool-use id.The envelope escapes XML metacharacters in every interpolated field before assembly. Subagent output may itself contain
</result>or</task-notification>, and an unescaped envelope would let that content forge sibling tags that the parent model would treat as trusted metadata. Truncation runs before escaping so it never slices through an entity.Permissions
Background agents cannot show interactive prompts, so the permission path is "ask hooks, then deny." Full ordering: L3/L4 allow/deny rules, approval-mode overrides (
auto-editfor edits), PermissionRequest hooks, then auto-deny if nothing decided. This replaces an earlier YOLO approach and matches claw-code'sshouldAvoidPermissionPromptsbehavior.Background agents inherit the parent's resolved approval mode, not the raw session config. A trusted-folder escalation from
defaulttoauto-editcarries through, so an edit-type tool still auto-approves in a background run; only non-edit confirmations (most commonly shell) hit the deny-by-default path.Headless support
Headless runs hold the process open via a terminal phase that polls registry state until no background agents remain. The drain runs as queued turns with full assistant-turn semantics — tool execution, approval updates, text streaming, error propagation — rather than a stripped-down variant, so stream-json approval callbacks reach permission-gated tools and provider errors produce non-zero exit codes. SIGINT/SIGTERM calls
registry.abortAll()so background agents stop promptly instead of pinning the process.One subtle correctness bug surfaced here: the single-flight drain originally cleared its in-flight reference from an async IIFE's
finally. When the queue was empty the IIFE had no awaits, so itsfinallyran synchronously during the assignment that set the reference, clearing it before the outer assignment overwrote it with a resolved promise — and every subsequent call short-circuited on the fulfilled reference. Moving the clear into a.finally()on the returned promise pushes it to a microtask, which runs after the outer assignment. Symptom wastask_startedwithouttask_notificationin headless JSON output; the process sat in hold-back until SIGTERM.SDK events
task_startedfires on registration,task_notificationon any terminal state. Both carrytask_id,status,usage, and optionaltool_use_id, letting consumers correlate background completions back to the Agent tool call that spawned them and track resource usage without parsing display text. The user-facing notification message is still emitted as a user-role history item for conversational continuity; the structured event is additive.Session resume
Notifications are recorded as user-role messages with
subtype: 'notification'(or'cron') and adisplayTextpayload, restoring both API history and the● ...display item on resume. The preprocessing skip is scoped narrowly — cron prompts still go through@-command, slash, and shell expansion; only true notifications bypass it — so resumed sessions behave identically to live ones.Forks
A background agent invoked through the fork dispatch path goes through
createForkSubagentrather than falling back to a plain headless agent. This preserves the parent's rendered system prompt, inherited history, and shared DashScope cache prefix. Forks are context-sharing extensions, not isolated subagents, so the general subagent exclusion list doesn't apply; recursion is still blocked by the ALS-based guard.Reviewer Test Plan
Build first:
npm run build && npm run bundle. The scenarios below exercise the three consumer surfaces and the trickiest interaction paths.Interactive happy path. Start a yolo-mode session. Ask the model to launch a background agent (Explore subagent, simple prompt). The tool widget should show
Running in background, the model should respond immediately, and within a few seconds a● Background agent "..." completed.line should appear followed by the model's summary of the result.Interactive cron rendering. With
QWEN_CODE_ENABLE_CRON=1, schedule a one-minute cron. The fire should render as a single● Cron: <label>line followed by the model turn — no duplicate>user message above it. Verify that a cron prompt containing@<path>still expands the file.Headless background agent. Run with
--output-format jsonand a prompt that launches a background agent. The process should exit 0 after the agent completes. The stream should containtask_started, a user message for the drain turn,task_notificationwith the structured payload, the drain-turn assistant response, andresult/success— in that order.Headless cancellation. Same setup, send SIGTERM during the hold-back phase. The process should exit non-zero within a couple of seconds rather than pinning until the agent finishes on its own.
Permissions in a trusted folder. From a default-mode interactive session, launch a background agent that performs both an edit and a shell command. The edit should succeed (resolved
auto-editinherited); the shell should be auto-denied (no UI available).Agent frontmatter flag. Create a subagent with
background: truein its frontmatter. Invoking it via Agent withoutrun_in_backgroundshould still run as a background task.Session resume. In an interactive session, let a background agent complete, then exit and resume. The
● ...notification line should reappear in scrollback and the model should have the notification in its API history.Unit suite:
cd packages/core && npx vitest run src/agents/background-tasks.test.ts src/tools/agent/agent.test.tscd packages/cli && npx vitest run src/nonInteractiveCli.test.ts src/ui/hooks/useGeminiStream.test.tsxTesting Matrix
Follow-up work
UI visibility — register background agents with
AgentViewContextso they appear as tabs in the tab bar with live output, status indicators, and keyboard navigation. Reuses the existing Arena multi-agent display infrastructure.Linked issues / bugs
shouldAvoidPermissionPromptswhich complements the mode resolution logic there.