feat(core): managed background shell pool with /tasks command#3642
Conversation
Replace shell.ts's `&` fork-and-detach background path with a managed process registry. Background shells now have observable lifecycle, captured output, and explicit cancellation — matching the pattern used by background subagents (#3076). Phase B from #3634 (background task management roadmap). What changes - New `BackgroundShellRegistry` (services/backgroundShellRegistry.ts): per-process entry with status (running / completed / failed / cancelled), AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op). Mirrors the lifecycle shape of #3471's BackgroundTaskRegistry so the two can be unified later. - `shell.ts` is_background path rewritten as `executeBackground`: - Spawns the unwrapped command (no '&', no pgrep envelope) - Streams stdout to `<projectDir>/tasks/<sessionId>/shell-<id>.output` (path layout aligns with the direction sketched in #3471 review) - Bridges the external abort signal into the entry's AbortController so a single source of truth governs cancellation - Returns immediately with id + output path; agent's turn isn't blocked - Settles the registry entry asynchronously when ShellExecutionService resolves: complete (clean exit) / fail (error) / cancel (aborted) - Removes ~120 lines of dead bg-specific code from shell.ts: pgrep wrapping, '&' appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup - New `/bashes` slash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message. What this PR doesn't do - Footer pill / dialog integration — gated on #3488 landing - task_stop / send_message integration — gated on #3471 landing - Auto-backgrounding heuristics for long foreground bash — Phase D Test plan - 11 registry unit tests (state machine + idempotent terminal transitions) - 4 background-path tests in shell.test.ts (spawn no-wrap + complete / fail / cancel settle paths) - 2 /bashes command tests (empty + populated) - Full core suite: 247 files / 6075 passed (existing tests unaffected)
- QwenLM/qwen-code#3642: BackgroundShellRegistry + /bashes (merge-after-nits) - cline/cline#10384: maxRetryAfter cap with test-cleanup hazard (request-changes) - aaif-goose/goose#8836: drop -i flag SIGTTOU fix (merge-as-is)
Three [Critical] from the auto review + naming alignment with Claude Code:
- shell.ts settle: non-zero exit code or termination signal now bucket into
`failed` instead of `completed`. The previous `if (result.error) fail else
complete()` would misreport `false` / failed `npm test` as success because
ShellExecutionService surfaces ordinary command failures as a non-zero
exitCode with `error: null`. Failure reason carries the exit code or signal
so `/tasks` shows the real cause.
- ShellExecutionService.childProcessFallback: add `streamStdout` mode that
emits each decoded chunk through the existing onOutputEvent path. The
default (foreground) path continues to buffer + emit the cleaned final
blob, so existing in-line shell calls are unaffected. executeBackground
opts in via `{ streamStdout: true }`, which is what makes the captured
output file actually useful for long-running processes (dev servers,
watchers) — without it the file stayed empty until the process exited.
- shell.ts test fixture: cancel-settle test was using `signal: 'SIGTERM'`
but `ShellExecutionResult.signal` is `number | null`. TS2322 broke the
build; switched to `signal: null`. Added a test that explicitly covers
the new "non-zero exit → failed" path so the bucketing change has
regression coverage.
- shell.ts comment: explicitly document why background shells force
`shouldUseNodePty=false` (no terminal, no human; node-pty would be dead
weight for fire-and-forget commands).
- /bashes → /tasks (alias bashes), description "List and manage background
tasks" — matches Claude Code's command name. Currently lists shells only;
will surface other task kinds (subagents, monitor) as those registries
land via #3471 / #3488.
|
谢谢 review。4 条全部修复 + 命名按 Claude Code 对齐,新增 1 个回归测试,已 push 到
另外按你提的 Claude Code 命名对齐, PR description 也补了 motivation / before-after / 设计决策 / migration 几段。 |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Thanks for the quick updates! The managed background shell pool is a great architectural improvement. While you've successfully added the streaming path to ShellExecutionService, my review found that it still accumulates output in memory, which poses an OOM risk for long-running background tasks. I've also noted a few suggestions regarding cancellation UI and log sanitization.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Config.shutdown() calls this.backgroundTaskRegistry.abortAll() but never cleans up this.backgroundShellRegistry. While ShellExecutionService.cleanup() kills child processes at the OS level, the registry's AbortControllers are never signaled, and in-flight promise logic is not cleanly torn down. Consider adding an abortAll() method to BackgroundShellRegistry and calling it in Config.shutdown().
— glm-5.1 via Qwen Code /review
- shellExecutionService streaming: drop stdout/stderr buffer + outputChunks accumulation in streaming mode. Each decoded chunk goes straight to onOutputEvent and is GC-eligible immediately. Long-running background commands (dev servers, watchers) no longer accumulate unbounded memory proportional to total output. Buffered (foreground) mode is unchanged. - shell.ts executeBackground: stripAnsi each chunk before writing to the output file. Dev servers / build tools spam color codes and cursor-move sequences that would render as garbage in the file the agent reads. - bashesCommand: command description "List and manage" → "List background tasks" — current implementation only supports listing, cancellation follows when the unified task_stop tool from #3471 is wired in. Replace the hand-rolled formatRuntime helper with the shared formatDuration utility (uses hideTrailingZeros for parity with the previous output). - backgroundShellRegistry: add a comment documenting the lack of an eviction policy as a known limitation. LRU / age-based / capped-size eviction (and on-disk output rotation) is left as a follow-up alongside the broader output-file lifecycle story.
|
第二轮 review 5 条全部处理 + 测试通过,已 push 到
测试:core 全套 6075 通过,背景相关 65 个测试通过(11 registry + 54 shell.test 含 5 个 bg 路径测试 + 50 shellExecutionService)。lint + build 都干净。 |
- shell.ts executeBackground: add 'error' listener on the output write stream. fs.createWriteStream surfaces write failures (disk full, permission, fs going away) as 'error' events; without a listener Node treats it as an uncaught exception and kills the entire CLI session. Log + drop is the sane default — the registry still settles via resultPromise so /tasks shows the right terminal status. - shell.ts executeBackground: store the abort handler reference and removeEventListener in the settle callback. Background shells outlive the turn signal; the dangling listener was keeping `entryAc` (and transitively `outputStream`) reachable until the turn signal itself was GC'd, which for long sessions would never happen. - shell.test.ts: extend the createWriteStream mock with an `on` stub so the new error-listener wiring doesn't crash the test suite.
|
第三轮 review 处理完,push 到
测试:core 全套 6075 通过,背景相关 65 个测试全过,lint + build clean。 |
Per follow-up review: the slash command should be exclusively /tasks.
Removes the `bashes` altName, renames `bashesCommand{,.test}.ts` →
`tasksCommand{,.test}.ts`, renames the exported binding `bashesCommand`
→ `tasksCommand`, and cleans up the remaining `/bashes` references in
backgroundShellRegistry.ts comments. No behavior change beyond the
alias removal.
The previous commit (03c8503) only captured the file rename via `git mv`; the export name change (`bashesCommand` → `tasksCommand`), the removal of `altNames: ['bashes']`, the import update in BuiltinCommandLoader, and the `/bashes` → `/tasks` comments in backgroundShellRegistry.ts were unstaged when that commit landed. Squash candidate before merge.
|
移除
两个 commit: 测试 / build / lint 全过。 |
Four reviewer concerns from @wenshao + @doudouOUC: - [Critical] Config.shutdown() now also calls `backgroundShellRegistry.abortAll()`. Previously only the subagent registry was aborted, so a managed background shell could outlive the CLI process and orphan its child. Symmetric with how `BackgroundTaskRegistry.abortAll()` is wired in. - [P1] shell.ts executeBackground strips a trailing `&` from the command before spawn. The managed path is itself the backgrounding mechanism; forwarding `node server.js &` verbatim made bash exit immediately while the real child outlived the wrapper, causing the registry to settle as `completed` while the shell was still running and chunked output to land on a closed stream. Strip + warn. - [P2] Output file moves under `storage.getProjectTempDir()` (specifically `<projectTempDir>/background-shells/<sessionId>/shell-<id>.output`). `ReadFileTool` already auto-allows the project temp dir, so the LLM can `Read` the captured output without bouncing off a permission prompt — important because background-agent contexts can't surface interactive prompts. - [P2] Background shells are no longer killed when the current turn's AbortSignal fires. Forwarding the turn signal into the entry's AbortController meant a Ctrl+C on the turn would also terminate intentionally backgrounded dev servers / watchers, contradicting the independent-lifecycle promise. Cancellation now flows only through `entryAc` (driven by future `task_stop` integration via #3471). Tests: - New `abortAll` registry tests cover running / mixed / empty cases. - `runs background commands as managed pool entries` test stops asserting the wrapper-vs-entry signal identity since they're now structurally separate (no turn-to-entry forwarding). - New `does not forward the turn signal into the background shell` test pins the new behavior. - New `strips trailing & from the spawned command` test pins the strip. - Removed the cancel-via-outer-signal settle test — that path no longer exists; cancellation is exercised end-to-end via the registry's own `cancel` and `abortAll` tests in `backgroundShellRegistry.test.ts`.
|
第四轮 review 4 条全部修复,push 到
测试:core 全过,11 → 13 registry tests,55 shell tests,50 shellExecutionService 不受影响。lint + build 干净。 |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gemini-3.1-pro-preview via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gemini-3.1-pro-preview via Qwen Code /review
Two reviewer concerns on the same line of #3642 round 4: - [Critical CodeQL] `\s*&+\s*$` is a polynomial-time regex on uncontrolled input (long all-`&` strings backtrack quadratically). - [P2 doudouOUC] `&+` is too greedy: it also rewrites `npm run dev &&` into `npm run dev` (breaks logical AND syntax) and `echo foo \&` into `echo foo \` (eats the escaped literal). Only the bare bash background operator should be stripped. Replace the regex with a small linear-time helper `stripTrailingBackgroundAmp` that explicitly checks for the three "don't touch" cases (`&&`, `\&`, no trailing `&`). Plain `endsWith` / `slice` — no regex backtracking, and the intent reads off the page. Tests: - Existing strip-trailing-`&` test still passes. - New `does not strip a trailing &&` test pins the logical-AND case. - New `does not strip an escaped trailing \\&` test pins the escape case.
|
第五轮 review (CodeQL + @doudouOUC) — 同一行的两个反馈,一并修复,push 到
helper 实现就这点: function stripTrailingBackgroundAmp(command: string): string {
const trimmed = command.trimEnd();
if (!trimmed.endsWith('&')) return command;
if (trimmed.endsWith('&&')) return command;
if (trimmed.endsWith('\\&')) return command;
return trimmed.slice(0, -1).trimEnd();
}测试:core 全过,shell.test.ts 57 个(之前 55 + 2 新)。 |
@doudouOUC noted that `streamStdout` shortcut returned before the binary-sniff path, so a background command emitting binary bytes (`cat /bin/ls`, image dump, etc.) would be text-decoded and appended to the task output file unbounded. Restructure handleOutput so the sniff-and-cutover logic runs in both modes: - Both modes accumulate up to MAX_SNIFF_SIZE for the binary check. The accumulator is bounded; once the threshold is reached, it stops growing in streaming mode (dropped on binary detection / left inert on text confirmation) and continues to accumulate in buffered mode (existing foreground behavior). - Streaming mode emits 'binary_detected' as soon as `isBinary` trips so the consumer can stop writing the output file. Up to ~4KB of bytes may have been emitted as text chunks before detection — this is bounded and acceptable; the unbounded write is the pathology reviewers flagged. - Streaming text mode still emits each decoded chunk immediately and does not accumulate stdout/stderr strings, so long-running text streams remain GC-friendly. - Buffered (foreground) behavior is unchanged — the sniff accumulator is the same path the existing tests cover. Tests: 50 shellExecutionService + 11 backgroundShellRegistry + 57 shell.test.ts all pass; no regressions.
|
第六轮 review (@doudouOUC) — 第二轮我把 streaming mode 的 binary sniff 跳过了换 memory 不爆,绕过了 binary detection。修复,push 到
Trade-off 透明:sniff 期间最多 ~4KB 已经 emit 给 outputStream(在 detection 触发前),文件里前几 KB 可能是 binary garbage 但 bounded。reviewer 担心的"无限 garbage"行为消除。 Foreground (buffered) mode 行为不变,50 个 shellExecutionService.test.ts 不受影响。 |
Two unrelated reds on the latest CI run: 1. [P1 doudouOUC] Streaming sniff buffer leaks on small chunks. The previous fix recomputed `sniffedBytes` from `Buffer.concat(outputChunks.slice(0, 20)).length` on every chunk — pinned to the first 20 chunks. If those total under MAX_SNIFF_SIZE (line-sized stdout, e.g. dev-server logs) the byte count never grew, the sniff branch stayed open forever, and `outputChunks` accumulated every later chunk — exactly the leak `streamStdout` was meant to prevent. Track sniffed bytes by running sum (`sniffedBytes += data.length`) so the bound is genuine. When sniff confirms text in streaming mode, drop the accumulator immediately so subsequent chunks fall through the streaming emit path without ever touching it. 2. file-exporters.test.ts afterEach `fs.rmSync` flaked on Windows (ENOTEMPTY: directory not empty). The exporter's underlying write stream hasn't always released its handle by the time `rmSync` runs. Pass `maxRetries: 5, retryDelay: 50` so the cleanup retries through the brief Windows handle-release window instead of failing the test on a CI quirk.
|
第七轮 — push 到
测试:4 文件 121 通过(shellExecutionService + backgroundShellRegistry + shell + file-exporters)。 |
doudouOUC
left a comment
There was a problem hiding this comment.
LGTM, thanks for addressing the feedback. No further issues from my side.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
This PR replaces the legacy &-append + pgrep-wrap background path with a managed BackgroundShellRegistry — clean idempotent state machine, the entry's own AbortController as cancellation source of truth, and abortAll() wired into Config.cleanup(). Quiet wins: Windows background shells now actually work via ShellExecutionService, and the new stripTrailingBackgroundAmp is a real ReDoS-safe replacement.
Future work
/tasks should land in the combined tasks dialog rather than dumping text into the conversation. PR #3488 adds the user-facing dialog (status-line pill + composer-Down overlay) and explicitly designs it as the home for "future background work (shells, MCP monitors, remote sessions) as additional sections." Once #3488 merges, the right surface for this PR's listing is a "Background shells" section in that dialog, not a slash command that prints text into the chat. The current text-dump approach is fine as an interim — agents (and users) can still inspect state — but worth flagging so it's not the permanent UX. Suggestion: keep /tasks for now, plan a follow-up after #3488 lands to fold this into the dialog and (optionally) deprecate the slash command.
Verdict
COMMENT — no blockers. Lifecycle and cancellation design is solid; main note is the UX direction once the background-agent dialog lands.
Conflicts resolved: - config.ts: keep both MonitorRegistry and BackgroundShellRegistry (import, instantiation, shutdown abortAll, accessor) - index.ts: keep both monitorRegistry and backgroundShellRegistry exports Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from #3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
… C) (#3684) * feat(core): event monitor tool with throttled stdout streaming (Phase C) Add a new Monitor tool that spawns a long-running shell command and streams its stdout lines back to the agent as event notifications. This is Phase C from the background task management roadmap (#3634, #3666). What changes: - New MonitorRegistry (services/monitorRegistry.ts): per-monitor entry with lifecycle (running/completed/failed/cancelled), idle timeout auto-stop, max events auto-stop, AbortController-based cancellation. Follows the same structural pattern as BackgroundTaskRegistry. - New Monitor tool (tools/monitor.ts): spawns via child_process.spawn with independent AbortController (Ctrl+C won't kill monitors), separate stdout/stderr line buffers, token-bucket throttling (burst=5, sustain=1/s). Returns immediately with monitor ID; events stream as notifications. - Sleep interception in shell.ts: detectBlockedSleepPattern() blocks foreground `sleep N` (N>=2) and guides model to use Monitor or is_background instead. - Config integration: MonitorRegistry instantiation, accessor, shutdown cleanup (abortAll), lazy tool registration. - CLI wiring: notification callbacks in useGeminiStream.ts (interactive) and nonInteractiveCli.ts (headless), including hold-back loop abort on exit and SIGINT cleanup. What this PR doesn't do (gated on #3471/#3488): - Footer pill / dialog integration - task_stop / send_message integration Test plan: - 21 MonitorRegistry unit tests (lifecycle, idle timeout, max events, XML escaping, nonexistent ID guard, callback clearing) - 20 Monitor tool unit tests (validation, spawn, line buffering, separate stdout/stderr buffers, throttling, signal-killed path, turn isolation) - 7 detectBlockedSleepPattern unit tests - 2 E2E tests (monitor invocation, sleep interception) - Full core suite: 248 files / 6151 passed Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): hold-back loop waits for monitors + emit task_started for SDK Two fixes from Codex review: P1: The non-interactive hold-back loop now includes monitorRegistry.getRunning() in its wait condition, so monitors can stream events before the CLI exits. Previously monitors were aborted immediately after the agent's first reply. P2: MonitorRegistry gains setRegisterCallback(), and nonInteractiveCli wires it to emit task_started system messages. Stream-json/SDK consumers now see a task_started for each monitor, matching the backgroundTaskRegistry contract. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): Windows process kill + pipeline sleep false-positive Two fixes from Codex review: P1: Monitor abort handler now uses `taskkill /f /t` on Windows instead of POSIX-only `process.kill(-pid)`. Follows the existing pattern in ShellExecutionService.childProcessFallback. P2: detectBlockedSleepPattern no longer uses splitCommands (which splits on `|` pipes). Replaced with a regex that only matches sleep followed by sequential separators (&&, ||, ;, &, newline), not pipes. `sleep 5 | cat` is now correctly allowed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): resolve TS errors in monitor.test.ts mock types Use Object.defineProperty for readonly ChildProcess.pid and proper Readable type for stdout/stderr mocks to satisfy strict tsc builds. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): remove false notification promise + add early-abort guard P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from #3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to useGeminiStream tests The useGeminiStream hook now calls config.getMonitorRegistry() to wire up monitor notification callbacks. The test mock config was missing this method, causing 64 test failures with "config.getMonitorRegistry is not a function". Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to nonInteractiveCli tests Same fix as useGeminiStream.test.tsx — the mock config needs getMonitorRegistry to avoid "is not a function" errors (29 failures). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review — CORE_TOOLS, directory param, test fix 1. Add 'monitor' to PermissionManager.CORE_TOOLS so coreTools allowlist correctly gates the monitor tool (same as run_shell_command). 2. Add optional 'directory' parameter to MonitorTool with workspace validation, mirroring ShellTool's directory support for multi-root workspaces. 3. Fix sleep-interception E2E test: readToolLogs() doesn't expose toolResult, so the old assertion was dead code. Now verifies via the model's output text instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address MonitorTool review #4186888042 Addresses three [Critical] review comments on packages/core/src/tools/monitor.ts: 1. Partial-line buffer unbounded growth (processLines) MAX_LINE_LENGTH was only enforced after a newline, so a command emitting a long stream without newlines would grow buffer.value without bound and re-split the entire accumulated string on every chunk. Now, when the buffer has no newline and exceeds MAX_LINE_LENGTH, we force-emit a single truncated event through the throttled path and reset the buffer. 2. Missing type guard on params.command validateToolParamValues called params.command.trim() without a typeof check. Schema validation normally catches this, but SDK/direct callers could bypass it and hit an uncaught TypeError. Added typeof === 'string' guard, matching the pattern used for max_events / idle_timeout_ms. 3. Workspace check bypass via raw startsWith The directory validator used workspaceDirs.some(d => params.directory .startsWith(d)), which allowed prefix collisions (e.g. /tmp/project-evil against a /tmp/project workspace) and skipped canonicalisation / symlink resolution. Switched to WorkspaceContext.isPathWithinWorkspace, which already does fullyResolvedPath + segment-aware isPathWithinRoot matching and is the standard used elsewhere in the codebase. Test coverage: added 6 unit tests covering non-string command guard, non-absolute directory rejection, prefix-collision rejection, traversal rejection, workspace acceptance, and partial-line cap behaviour (including buffer reset). All 26 monitor.test.ts cases pass. The same startsWith pattern also exists in ShellTool and is tracked as a separate follow-up to keep this PR focused on Phase C scope. * fix(core): scope monitor always-allow permissions Populate Monitor confirmation permissionRules using the same command-rule extraction path as ShellTool, so ProceedAlways persists command-scoped Bash(...) rules instead of a broad monitor-level allow. Also add unit coverage for command-scoped rules, filtering already-allowed subcommands, and extractor fallback behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): decouple monitor permission scope from Bash rules Remove pm.isCommandAllowed() from MonitorToolInvocation.getConfirmationDetails() to prevent existing Bash(...) allow rules from shrinking the monitor confirmation scope. Monitor is a long-running background process with a different risk profile than one-shot shell execution and should maintain its own permission boundary. Only AST-based read-only filtering is retained. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): unify monitor error/exit cleanup to prevent resource leaks Extract a shared cleanup() helper called from both the `exit` and `error` event handlers. Previously the `error` handler did not flush buffers, clear buffer values, remove the abort listener, or log dropped-line stats — causing potential memory leaks when `error` fires without a subsequent `exit` (e.g. ENOENT for missing commands). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): add user-skills-directory guard to monitor directory validation Mirror ShellTool's getUserSkillsDirs() check in MonitorTool's validateToolParamValues() to prevent monitor commands from running inside user skills directories. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(core): add Monitor(...) permission namespace for monitor tool (#3726) Introduce a dedicated Monitor(...) permission namespace so monitor and shell tools have independent permission boundaries. Previously monitor emitted Bash(...) rules, causing "Always Allow" to fail for future monitor invocations while unintentionally granting run_shell_command. Changes: - rule-parser.ts: add Monitor alias, SHELL_TOOL_NAMES entry, CANONICAL_TO_RULE_DISPLAY, DISPLAY_NAME_TO_VERB - permission-manager.ts: extract SHELL_LIKE_TOOLS set so evaluate(), evaluateSingle(), hasRelevantRules(), hasMatchingAskRule() handle both run_shell_command and monitor - monitor.ts: emit Monitor(...) instead of Bash(...) in permissionRules - Tests: parseRule, matchesRule, cross-tool isolation regression, buildPermissionRules, buildHumanReadableRuleLabel for Monitor Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): decouple headless monitor lifetime from final result Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): stabilize stream-json monitor session shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): deny monitor in headless approval defaults Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): honor tool aliases in headless allow checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address opus review — sleep regex, monitor cap, non-interactive cleanup - Fix sleep interception false positive for backgrounded sleep (`sleep 5 & echo done`). Remove bare `&` from separator character class so the background operator is not treated as a sequential separator. - Add MAX_CONCURRENT_MONITORS (16) check in MonitorRegistry.register() and early rejection in MonitorTool.execute() to prevent unbounded process spawning. - Widen monitorId from 8 to 16 hex chars to reduce birthday collision risk. - Abort all running monitors in nonInteractiveCli.ts success-path finally so piped stdio refs don't keep the Node event loop alive after result emission in one-shot (--print) mode. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors and background shells on /clear Without this, long-running monitors from a previous session survive /clear and continue pushing events into the new session's notification queue. This enables cross-session prompt injection where a malicious monitor persists across the user's escape hatch. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors on stream-json session shutdown Call monitorRegistry.abortAll() in both shutdown() and drainAndShutdown() so detached monitor child processes don't survive session termination. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(cli): use content event type in stream tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): isolate session cleanup on clear and shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): finalize session cleanup after drain Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close remaining monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve shell cwd in virtual permission checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize trailing background ampersands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor permission and wrapper handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): make monitor CI assertions cross-platform Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor wrapper normalization Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize wrapped monitor commands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor headless edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve monitor spawn errors Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor register cleanup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): parse monitor wrapper script token Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review comments for monitor tool - Make Bash(...) permission rules cover monitor via toolMatchesRuleToolName, so deny rules like Bash(rm *) also block monitor({command: "rm ..."}) - Remove dead `normalizeRuleToolName` mock reference in config.test.ts - Fix tool description to mention stdout/stderr instead of just stdout - Export MAX_CONCURRENT_MONITORS from monitorRegistry and use it in monitor.ts instead of hardcoded 16 - Rename ambiguous MAX_LINE_LENGTH constants: PARTIAL_LINE_BUFFER_CAP (4096, monitor.ts) and EVENT_LINE_TRUNCATE (2000, monitorRegistry.ts) - Fix schema description text: "Max 80 characters" → "Truncated to 80 characters in display" - Add .unref() to SIGTERM→SIGKILL escalation timer to prevent 200ms exit delay Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): resolve clear command typecheck issues Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve background tasks across shutdown abort Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address latest monitor review comments Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): handle monitors across session switches Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): cover aborted monitor startup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address remaining monitor PR review comments Adopts four unresolved review threads on PR #3684: * shell: trim top-level trailing comments before validating sleep separator so 'sleep 5 # wait' no longer bypasses detectBlockedSleepPattern. * monitor: add sanitizeMonitorLine to strip C0/C1 control chars (except tab) and defang structural envelope tag names with U+200B before forwarding output to the model, blocking prompt-injection attempts hidden in monitored stdout/stderr. * monitor: declare line buffers and throttledEmit before abortHandler to avoid TDZ on synchronous abort paths, and add flushPartialLineBuffers called from both abortHandler (before kill) and cleanup (natural exit/error) so partial-line data is no longer silently dropped on cancel. * permissions: document that normalizePermissionContext relies on buildPermissionCheckContext to forward monitor's directory as cwd, and add regression tests proving relative-path Read(./...) allow and deny rules resolve against the monitor's explicit cwd. * fix(core): abort running monitors in MonitorRegistry.reset() reset() previously only cleared idle timers and emptied the map without aborting running monitors' AbortControllers. This could orphan child processes when reset() was called without a prior abortAll(), e.g. via useResumeCommand → resetBackgroundStateForSessionSwitch. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core): harden monitor notification XML and displayText - Extend escapeXml to escape " and ' as defense-in-depth: safe to reuse the helper in any future XML attribute context without re-auditing. - Strip C0 (except tab) and C1 control characters from the displayText surface before interpolation, so untrusted child-process output cannot leak ANSI escapes / NUL bytes into the operator's terminal even if a direct caller of MonitorRegistry.emitEvent skips sanitization. Adds unit tests for both hardening paths. * test(core): cover token-bucket throttling and commented-sleep bypass - Add 4 unit tests for the monitor token-bucket throttle (burst=5, 1 token/sec refill): burst cap, refill release, long-idle bucket cap, and whitespace lines not consuming budget. Uses vi.setSystemTime to exercise Date.now() without advancing pending setTimeouts. - Add an E2E case that feeds 'sleep 5 # wait for db' through the shell tool to lock in trimTrailingShellComment behavior end-to-end; the unit-level coverage in shell.test.ts remains authoritative but the E2E anchor prevents a regression from silently passing unit tests. * fix(core): address 3 remaining copilot review comments 1. shell.ts sleep interception: strip shell wrapper before detecting the blocked sleep pattern so `bash -c 'sleep 5'` / `sh -c ...` cannot route around the block. Mirrors every other sensitive check in shell.ts, which already normalizes through stripShellWrapper. 2. monitorRegistry.ts emitEvent auto-stop: settle the entry BEFORE aborting its controller so that any synchronous abort listener that flushes buffered output back through registry.emitEvent() (e.g. the Monitor tool's flushPartialLineBuffers) finds status !== 'running' and short-circuits instead of overshooting maxEvents and emitting a duplicate 'Max events reached' terminal notification. 3. monitorRegistry.ts truncateDescription: cap output at exactly MAX_DESCRIPTION_LENGTH by counting the ellipsis against the budget, instead of returning MAX_DESCRIPTION_LENGTH + 3 characters. Each fix is covered by a new unit test. * fix(core): address review comments — sanitize, notify, kill logging, throttle observability - Remove double normalize in buildPermissionCheckContext (PM is single source) - Add {notify:false} to Config.shutdown() and abortTaskRegistries() abortAll - Swap settle-before-abort in cancel() and resetIdleTimer() to prevent races - Add stripDisplayControlChars to emitTerminalNotification - Sanitize monitor description at entry creation via sanitizeMonitorLine - Surface throttle-dropped line count in terminal notification - Add .unref() to idle timer to allow clean process exit - Add error handler + stdio:ignore to Windows taskkill spawn - Log SIGTERM/SIGKILL kill failures via debugLogger.warn - Attach early child error handler to cover spawn-to-register window - Destroy child stdio on register failure to prevent handle leaks - Improve stripShellWrapper to handle absolute paths, combined flags, env prefix - Improve SHELL_TOOL_NAMES documentation and toolMatchesRuleToolName clarity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): resolve monitor tool typecheck errors - Cast child.stdout/stderr to a minimal { destroy?: () => void } shape so the optional destroy() call compiles and still works with test mocks. - Initialize droppedLines: 0 in MonitorEntry test fixtures that predate the field becoming required. * fix(monitor): add missing stdio option in taskkill test assertions (#3784) * fix(core): address monitor review feedback * fix(core): harden monitor command lifecycle --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Captures current state of the bg-agent subsystem in the fork (what's already wired, what is not), maps the upstream qwen-code PRs we have not yet ported (QwenLM#3076 → QwenLM#3739), and sketches three phases to close the gap: - Phase A: model-facing agent control + event monitor (QwenLM#3471, QwenLM#3684, QwenLM#3687) - Phase B: TUI surface + /tasks command (QwenLM#3488, QwenLM#3642) - Phase C: cross-session resume (QwenLM#3739) Also calls out cross-cutting decisions we should make before Phase A lands: settings layout, stop-tool naming, persistence shape, gateway validation. This is a planning doc, not a spec. Per-phase code-level designs come later. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… C) (#3684) * feat(core): event monitor tool with throttled stdout streaming (Phase C) Add a new Monitor tool that spawns a long-running shell command and streams its stdout lines back to the agent as event notifications. This is Phase C from the background task management roadmap (#3634, #3666). What changes: - New MonitorRegistry (services/monitorRegistry.ts): per-monitor entry with lifecycle (running/completed/failed/cancelled), idle timeout auto-stop, max events auto-stop, AbortController-based cancellation. Follows the same structural pattern as BackgroundTaskRegistry. - New Monitor tool (tools/monitor.ts): spawns via child_process.spawn with independent AbortController (Ctrl+C won't kill monitors), separate stdout/stderr line buffers, token-bucket throttling (burst=5, sustain=1/s). Returns immediately with monitor ID; events stream as notifications. - Sleep interception in shell.ts: detectBlockedSleepPattern() blocks foreground `sleep N` (N>=2) and guides model to use Monitor or is_background instead. - Config integration: MonitorRegistry instantiation, accessor, shutdown cleanup (abortAll), lazy tool registration. - CLI wiring: notification callbacks in useGeminiStream.ts (interactive) and nonInteractiveCli.ts (headless), including hold-back loop abort on exit and SIGINT cleanup. What this PR doesn't do (gated on #3471/#3488): - Footer pill / dialog integration - task_stop / send_message integration Test plan: - 21 MonitorRegistry unit tests (lifecycle, idle timeout, max events, XML escaping, nonexistent ID guard, callback clearing) - 20 Monitor tool unit tests (validation, spawn, line buffering, separate stdout/stderr buffers, throttling, signal-killed path, turn isolation) - 7 detectBlockedSleepPattern unit tests - 2 E2E tests (monitor invocation, sleep interception) - Full core suite: 248 files / 6151 passed Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): hold-back loop waits for monitors + emit task_started for SDK Two fixes from Codex review: P1: The non-interactive hold-back loop now includes monitorRegistry.getRunning() in its wait condition, so monitors can stream events before the CLI exits. Previously monitors were aborted immediately after the agent's first reply. P2: MonitorRegistry gains setRegisterCallback(), and nonInteractiveCli wires it to emit task_started system messages. Stream-json/SDK consumers now see a task_started for each monitor, matching the backgroundTaskRegistry contract. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): Windows process kill + pipeline sleep false-positive Two fixes from Codex review: P1: Monitor abort handler now uses `taskkill /f /t` on Windows instead of POSIX-only `process.kill(-pid)`. Follows the existing pattern in ShellExecutionService.childProcessFallback. P2: detectBlockedSleepPattern no longer uses splitCommands (which splits on `|` pipes). Replaced with a regex that only matches sleep followed by sequential separators (&&, ||, ;, &, newline), not pipes. `sleep 5 | cat` is now correctly allowed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): resolve TS errors in monitor.test.ts mock types Use Object.defineProperty for readonly ChildProcess.pid and proper Readable type for stdout/stderr mocks to satisfy strict tsc builds. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): remove false notification promise + add early-abort guard P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from #3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to useGeminiStream tests The useGeminiStream hook now calls config.getMonitorRegistry() to wire up monitor notification callbacks. The test mock config was missing this method, causing 64 test failures with "config.getMonitorRegistry is not a function". Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to nonInteractiveCli tests Same fix as useGeminiStream.test.tsx — the mock config needs getMonitorRegistry to avoid "is not a function" errors (29 failures). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review — CORE_TOOLS, directory param, test fix 1. Add 'monitor' to PermissionManager.CORE_TOOLS so coreTools allowlist correctly gates the monitor tool (same as run_shell_command). 2. Add optional 'directory' parameter to MonitorTool with workspace validation, mirroring ShellTool's directory support for multi-root workspaces. 3. Fix sleep-interception E2E test: readToolLogs() doesn't expose toolResult, so the old assertion was dead code. Now verifies via the model's output text instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address MonitorTool review #4186888042 Addresses three [Critical] review comments on packages/core/src/tools/monitor.ts: 1. Partial-line buffer unbounded growth (processLines) MAX_LINE_LENGTH was only enforced after a newline, so a command emitting a long stream without newlines would grow buffer.value without bound and re-split the entire accumulated string on every chunk. Now, when the buffer has no newline and exceeds MAX_LINE_LENGTH, we force-emit a single truncated event through the throttled path and reset the buffer. 2. Missing type guard on params.command validateToolParamValues called params.command.trim() without a typeof check. Schema validation normally catches this, but SDK/direct callers could bypass it and hit an uncaught TypeError. Added typeof === 'string' guard, matching the pattern used for max_events / idle_timeout_ms. 3. Workspace check bypass via raw startsWith The directory validator used workspaceDirs.some(d => params.directory .startsWith(d)), which allowed prefix collisions (e.g. /tmp/project-evil against a /tmp/project workspace) and skipped canonicalisation / symlink resolution. Switched to WorkspaceContext.isPathWithinWorkspace, which already does fullyResolvedPath + segment-aware isPathWithinRoot matching and is the standard used elsewhere in the codebase. Test coverage: added 6 unit tests covering non-string command guard, non-absolute directory rejection, prefix-collision rejection, traversal rejection, workspace acceptance, and partial-line cap behaviour (including buffer reset). All 26 monitor.test.ts cases pass. The same startsWith pattern also exists in ShellTool and is tracked as a separate follow-up to keep this PR focused on Phase C scope. * fix(core): scope monitor always-allow permissions Populate Monitor confirmation permissionRules using the same command-rule extraction path as ShellTool, so ProceedAlways persists command-scoped Bash(...) rules instead of a broad monitor-level allow. Also add unit coverage for command-scoped rules, filtering already-allowed subcommands, and extractor fallback behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): decouple monitor permission scope from Bash rules Remove pm.isCommandAllowed() from MonitorToolInvocation.getConfirmationDetails() to prevent existing Bash(...) allow rules from shrinking the monitor confirmation scope. Monitor is a long-running background process with a different risk profile than one-shot shell execution and should maintain its own permission boundary. Only AST-based read-only filtering is retained. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): unify monitor error/exit cleanup to prevent resource leaks Extract a shared cleanup() helper called from both the `exit` and `error` event handlers. Previously the `error` handler did not flush buffers, clear buffer values, remove the abort listener, or log dropped-line stats — causing potential memory leaks when `error` fires without a subsequent `exit` (e.g. ENOENT for missing commands). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): add user-skills-directory guard to monitor directory validation Mirror ShellTool's getUserSkillsDirs() check in MonitorTool's validateToolParamValues() to prevent monitor commands from running inside user skills directories. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(core): add Monitor(...) permission namespace for monitor tool (#3726) Introduce a dedicated Monitor(...) permission namespace so monitor and shell tools have independent permission boundaries. Previously monitor emitted Bash(...) rules, causing "Always Allow" to fail for future monitor invocations while unintentionally granting run_shell_command. Changes: - rule-parser.ts: add Monitor alias, SHELL_TOOL_NAMES entry, CANONICAL_TO_RULE_DISPLAY, DISPLAY_NAME_TO_VERB - permission-manager.ts: extract SHELL_LIKE_TOOLS set so evaluate(), evaluateSingle(), hasRelevantRules(), hasMatchingAskRule() handle both run_shell_command and monitor - monitor.ts: emit Monitor(...) instead of Bash(...) in permissionRules - Tests: parseRule, matchesRule, cross-tool isolation regression, buildPermissionRules, buildHumanReadableRuleLabel for Monitor Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): decouple headless monitor lifetime from final result Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): stabilize stream-json monitor session shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): deny monitor in headless approval defaults Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): honor tool aliases in headless allow checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address opus review — sleep regex, monitor cap, non-interactive cleanup - Fix sleep interception false positive for backgrounded sleep (`sleep 5 & echo done`). Remove bare `&` from separator character class so the background operator is not treated as a sequential separator. - Add MAX_CONCURRENT_MONITORS (16) check in MonitorRegistry.register() and early rejection in MonitorTool.execute() to prevent unbounded process spawning. - Widen monitorId from 8 to 16 hex chars to reduce birthday collision risk. - Abort all running monitors in nonInteractiveCli.ts success-path finally so piped stdio refs don't keep the Node event loop alive after result emission in one-shot (--print) mode. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors and background shells on /clear Without this, long-running monitors from a previous session survive /clear and continue pushing events into the new session's notification queue. This enables cross-session prompt injection where a malicious monitor persists across the user's escape hatch. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors on stream-json session shutdown Call monitorRegistry.abortAll() in both shutdown() and drainAndShutdown() so detached monitor child processes don't survive session termination. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(cli): use content event type in stream tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): isolate session cleanup on clear and shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): finalize session cleanup after drain Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close remaining monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve shell cwd in virtual permission checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize trailing background ampersands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor permission and wrapper handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): make monitor CI assertions cross-platform Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor wrapper normalization Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize wrapped monitor commands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor headless edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve monitor spawn errors Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor register cleanup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): parse monitor wrapper script token Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review comments for monitor tool - Make Bash(...) permission rules cover monitor via toolMatchesRuleToolName, so deny rules like Bash(rm *) also block monitor({command: "rm ..."}) - Remove dead `normalizeRuleToolName` mock reference in config.test.ts - Fix tool description to mention stdout/stderr instead of just stdout - Export MAX_CONCURRENT_MONITORS from monitorRegistry and use it in monitor.ts instead of hardcoded 16 - Rename ambiguous MAX_LINE_LENGTH constants: PARTIAL_LINE_BUFFER_CAP (4096, monitor.ts) and EVENT_LINE_TRUNCATE (2000, monitorRegistry.ts) - Fix schema description text: "Max 80 characters" → "Truncated to 80 characters in display" - Add .unref() to SIGTERM→SIGKILL escalation timer to prevent 200ms exit delay Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): resolve clear command typecheck issues Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve background tasks across shutdown abort Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address latest monitor review comments Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): handle monitors across session switches Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): cover aborted monitor startup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address remaining monitor PR review comments Adopts four unresolved review threads on PR #3684: * shell: trim top-level trailing comments before validating sleep separator so 'sleep 5 # wait' no longer bypasses detectBlockedSleepPattern. * monitor: add sanitizeMonitorLine to strip C0/C1 control chars (except tab) and defang structural envelope tag names with U+200B before forwarding output to the model, blocking prompt-injection attempts hidden in monitored stdout/stderr. * monitor: declare line buffers and throttledEmit before abortHandler to avoid TDZ on synchronous abort paths, and add flushPartialLineBuffers called from both abortHandler (before kill) and cleanup (natural exit/error) so partial-line data is no longer silently dropped on cancel. * permissions: document that normalizePermissionContext relies on buildPermissionCheckContext to forward monitor's directory as cwd, and add regression tests proving relative-path Read(./...) allow and deny rules resolve against the monitor's explicit cwd. * fix(core): abort running monitors in MonitorRegistry.reset() reset() previously only cleared idle timers and emptied the map without aborting running monitors' AbortControllers. This could orphan child processes when reset() was called without a prior abortAll(), e.g. via useResumeCommand → resetBackgroundStateForSessionSwitch. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core): harden monitor notification XML and displayText - Extend escapeXml to escape " and ' as defense-in-depth: safe to reuse the helper in any future XML attribute context without re-auditing. - Strip C0 (except tab) and C1 control characters from the displayText surface before interpolation, so untrusted child-process output cannot leak ANSI escapes / NUL bytes into the operator's terminal even if a direct caller of MonitorRegistry.emitEvent skips sanitization. Adds unit tests for both hardening paths. * test(core): cover token-bucket throttling and commented-sleep bypass - Add 4 unit tests for the monitor token-bucket throttle (burst=5, 1 token/sec refill): burst cap, refill release, long-idle bucket cap, and whitespace lines not consuming budget. Uses vi.setSystemTime to exercise Date.now() without advancing pending setTimeouts. - Add an E2E case that feeds 'sleep 5 # wait for db' through the shell tool to lock in trimTrailingShellComment behavior end-to-end; the unit-level coverage in shell.test.ts remains authoritative but the E2E anchor prevents a regression from silently passing unit tests. * fix(core): address 3 remaining copilot review comments 1. shell.ts sleep interception: strip shell wrapper before detecting the blocked sleep pattern so `bash -c 'sleep 5'` / `sh -c ...` cannot route around the block. Mirrors every other sensitive check in shell.ts, which already normalizes through stripShellWrapper. 2. monitorRegistry.ts emitEvent auto-stop: settle the entry BEFORE aborting its controller so that any synchronous abort listener that flushes buffered output back through registry.emitEvent() (e.g. the Monitor tool's flushPartialLineBuffers) finds status !== 'running' and short-circuits instead of overshooting maxEvents and emitting a duplicate 'Max events reached' terminal notification. 3. monitorRegistry.ts truncateDescription: cap output at exactly MAX_DESCRIPTION_LENGTH by counting the ellipsis against the budget, instead of returning MAX_DESCRIPTION_LENGTH + 3 characters. Each fix is covered by a new unit test. * fix(core): address review comments — sanitize, notify, kill logging, throttle observability - Remove double normalize in buildPermissionCheckContext (PM is single source) - Add {notify:false} to Config.shutdown() and abortTaskRegistries() abortAll - Swap settle-before-abort in cancel() and resetIdleTimer() to prevent races - Add stripDisplayControlChars to emitTerminalNotification - Sanitize monitor description at entry creation via sanitizeMonitorLine - Surface throttle-dropped line count in terminal notification - Add .unref() to idle timer to allow clean process exit - Add error handler + stdio:ignore to Windows taskkill spawn - Log SIGTERM/SIGKILL kill failures via debugLogger.warn - Attach early child error handler to cover spawn-to-register window - Destroy child stdio on register failure to prevent handle leaks - Improve stripShellWrapper to handle absolute paths, combined flags, env prefix - Improve SHELL_TOOL_NAMES documentation and toolMatchesRuleToolName clarity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): resolve monitor tool typecheck errors - Cast child.stdout/stderr to a minimal { destroy?: () => void } shape so the optional destroy() call compiles and still works with test mocks. - Initialize droppedLines: 0 in MonitorEntry test fixtures that predate the field becoming required. * fix(monitor): add missing stdio option in taskkill test assertions (#3784) * fix(core): address monitor review feedback * fix(core): harden monitor command lifecycle --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
…#3642) * feat(core): managed background shell pool with /bashes command Replace shell.ts's `&` fork-and-detach background path with a managed process registry. Background shells now have observable lifecycle, captured output, and explicit cancellation — matching the pattern used by background subagents (QwenLM#3076). Phase B from QwenLM#3634 (background task management roadmap). What changes - New `BackgroundShellRegistry` (services/backgroundShellRegistry.ts): per-process entry with status (running / completed / failed / cancelled), AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op). Mirrors the lifecycle shape of QwenLM#3471's BackgroundTaskRegistry so the two can be unified later. - `shell.ts` is_background path rewritten as `executeBackground`: - Spawns the unwrapped command (no '&', no pgrep envelope) - Streams stdout to `<projectDir>/tasks/<sessionId>/shell-<id>.output` (path layout aligns with the direction sketched in QwenLM#3471 review) - Bridges the external abort signal into the entry's AbortController so a single source of truth governs cancellation - Returns immediately with id + output path; agent's turn isn't blocked - Settles the registry entry asynchronously when ShellExecutionService resolves: complete (clean exit) / fail (error) / cancel (aborted) - Removes ~120 lines of dead bg-specific code from shell.ts: pgrep wrapping, '&' appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup - New `/bashes` slash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message. What this PR doesn't do - Footer pill / dialog integration — gated on QwenLM#3488 landing - task_stop / send_message integration — gated on QwenLM#3471 landing - Auto-backgrounding heuristics for long foreground bash — Phase D Test plan - 11 registry unit tests (state machine + idempotent terminal transitions) - 4 background-path tests in shell.test.ts (spawn no-wrap + complete / fail / cancel settle paths) - 2 /bashes command tests (empty + populated) - Full core suite: 247 files / 6075 passed (existing tests unaffected) * fix(core): address PR QwenLM#3642 review feedback Three [Critical] from the auto review + naming alignment with Claude Code: - shell.ts settle: non-zero exit code or termination signal now bucket into `failed` instead of `completed`. The previous `if (result.error) fail else complete()` would misreport `false` / failed `npm test` as success because ShellExecutionService surfaces ordinary command failures as a non-zero exitCode with `error: null`. Failure reason carries the exit code or signal so `/tasks` shows the real cause. - ShellExecutionService.childProcessFallback: add `streamStdout` mode that emits each decoded chunk through the existing onOutputEvent path. The default (foreground) path continues to buffer + emit the cleaned final blob, so existing in-line shell calls are unaffected. executeBackground opts in via `{ streamStdout: true }`, which is what makes the captured output file actually useful for long-running processes (dev servers, watchers) — without it the file stayed empty until the process exited. - shell.ts test fixture: cancel-settle test was using `signal: 'SIGTERM'` but `ShellExecutionResult.signal` is `number | null`. TS2322 broke the build; switched to `signal: null`. Added a test that explicitly covers the new "non-zero exit → failed" path so the bucketing change has regression coverage. - shell.ts comment: explicitly document why background shells force `shouldUseNodePty=false` (no terminal, no human; node-pty would be dead weight for fire-and-forget commands). - /bashes → /tasks (alias bashes), description "List and manage background tasks" — matches Claude Code's command name. Currently lists shells only; will surface other task kinds (subagents, monitor) as those registries land via QwenLM#3471 / QwenLM#3488. * fix(core): address PR QwenLM#3642 second-round review feedback - shellExecutionService streaming: drop stdout/stderr buffer + outputChunks accumulation in streaming mode. Each decoded chunk goes straight to onOutputEvent and is GC-eligible immediately. Long-running background commands (dev servers, watchers) no longer accumulate unbounded memory proportional to total output. Buffered (foreground) mode is unchanged. - shell.ts executeBackground: stripAnsi each chunk before writing to the output file. Dev servers / build tools spam color codes and cursor-move sequences that would render as garbage in the file the agent reads. - bashesCommand: command description "List and manage" → "List background tasks" — current implementation only supports listing, cancellation follows when the unified task_stop tool from QwenLM#3471 is wired in. Replace the hand-rolled formatRuntime helper with the shared formatDuration utility (uses hideTrailingZeros for parity with the previous output). - backgroundShellRegistry: add a comment documenting the lack of an eviction policy as a known limitation. LRU / age-based / capped-size eviction (and on-disk output rotation) is left as a follow-up alongside the broader output-file lifecycle story. * fix(core): address PR QwenLM#3642 third-round review feedback - shell.ts executeBackground: add 'error' listener on the output write stream. fs.createWriteStream surfaces write failures (disk full, permission, fs going away) as 'error' events; without a listener Node treats it as an uncaught exception and kills the entire CLI session. Log + drop is the sane default — the registry still settles via resultPromise so /tasks shows the right terminal status. - shell.ts executeBackground: store the abort handler reference and removeEventListener in the settle callback. Background shells outlive the turn signal; the dangling listener was keeping `entryAc` (and transitively `outputStream`) reachable until the turn signal itself was GC'd, which for long sessions would never happen. - shell.test.ts: extend the createWriteStream mock with an `on` stub so the new error-listener wiring doesn't crash the test suite. * refactor(cli): drop /bashes alias and rename file to tasksCommand Per follow-up review: the slash command should be exclusively /tasks. Removes the `bashes` altName, renames `bashesCommand{,.test}.ts` → `tasksCommand{,.test}.ts`, renames the exported binding `bashesCommand` → `tasksCommand`, and cleans up the remaining `/bashes` references in backgroundShellRegistry.ts comments. No behavior change beyond the alias removal. * refactor(cli): finish tasksCommand rename — apply content changes The previous commit (7b8b73b75) only captured the file rename via `git mv`; the export name change (`bashesCommand` → `tasksCommand`), the removal of `altNames: ['bashes']`, the import update in BuiltinCommandLoader, and the `/bashes` → `/tasks` comments in backgroundShellRegistry.ts were unstaged when that commit landed. Squash candidate before merge. * fix(core): address PR QwenLM#3642 fourth-round review feedback Four reviewer concerns from @wenshao + @doudouOUC: - [Critical] Config.shutdown() now also calls `backgroundShellRegistry.abortAll()`. Previously only the subagent registry was aborted, so a managed background shell could outlive the CLI process and orphan its child. Symmetric with how `BackgroundTaskRegistry.abortAll()` is wired in. - [P1] shell.ts executeBackground strips a trailing `&` from the command before spawn. The managed path is itself the backgrounding mechanism; forwarding `node server.js &` verbatim made bash exit immediately while the real child outlived the wrapper, causing the registry to settle as `completed` while the shell was still running and chunked output to land on a closed stream. Strip + warn. - [P2] Output file moves under `storage.getProjectTempDir()` (specifically `<projectTempDir>/background-shells/<sessionId>/shell-<id>.output`). `ReadFileTool` already auto-allows the project temp dir, so the LLM can `Read` the captured output without bouncing off a permission prompt — important because background-agent contexts can't surface interactive prompts. - [P2] Background shells are no longer killed when the current turn's AbortSignal fires. Forwarding the turn signal into the entry's AbortController meant a Ctrl+C on the turn would also terminate intentionally backgrounded dev servers / watchers, contradicting the independent-lifecycle promise. Cancellation now flows only through `entryAc` (driven by future `task_stop` integration via QwenLM#3471). Tests: - New `abortAll` registry tests cover running / mixed / empty cases. - `runs background commands as managed pool entries` test stops asserting the wrapper-vs-entry signal identity since they're now structurally separate (no turn-to-entry forwarding). - New `does not forward the turn signal into the background shell` test pins the new behavior. - New `strips trailing & from the spawned command` test pins the strip. - Removed the cancel-via-outer-signal settle test — that path no longer exists; cancellation is exercised end-to-end via the registry's own `cancel` and `abortAll` tests in `backgroundShellRegistry.test.ts`. * fix(core): tighten trailing & strip — narrow regex + ReDoS-safe Two reviewer concerns on the same line of QwenLM#3642 round 4: - [Critical CodeQL] `\s*&+\s*$` is a polynomial-time regex on uncontrolled input (long all-`&` strings backtrack quadratically). - [P2 doudouOUC] `&+` is too greedy: it also rewrites `npm run dev &&` into `npm run dev` (breaks logical AND syntax) and `echo foo \&` into `echo foo \` (eats the escaped literal). Only the bare bash background operator should be stripped. Replace the regex with a small linear-time helper `stripTrailingBackgroundAmp` that explicitly checks for the three "don't touch" cases (`&&`, `\&`, no trailing `&`). Plain `endsWith` / `slice` — no regex backtracking, and the intent reads off the page. Tests: - Existing strip-trailing-`&` test still passes. - New `does not strip a trailing &&` test pins the logical-AND case. - New `does not strip an escaped trailing \\&` test pins the escape case. * fix(core): keep binary-detection sniff in streaming mode @doudouOUC noted that `streamStdout` shortcut returned before the binary-sniff path, so a background command emitting binary bytes (`cat /bin/ls`, image dump, etc.) would be text-decoded and appended to the task output file unbounded. Restructure handleOutput so the sniff-and-cutover logic runs in both modes: - Both modes accumulate up to MAX_SNIFF_SIZE for the binary check. The accumulator is bounded; once the threshold is reached, it stops growing in streaming mode (dropped on binary detection / left inert on text confirmation) and continues to accumulate in buffered mode (existing foreground behavior). - Streaming mode emits 'binary_detected' as soon as `isBinary` trips so the consumer can stop writing the output file. Up to ~4KB of bytes may have been emitted as text chunks before detection — this is bounded and acceptable; the unbounded write is the pathology reviewers flagged. - Streaming text mode still emits each decoded chunk immediately and does not accumulate stdout/stderr strings, so long-running text streams remain GC-friendly. - Buffered (foreground) behavior is unchanged — the sniff accumulator is the same path the existing tests cover. Tests: 50 shellExecutionService + 11 backgroundShellRegistry + 57 shell.test.ts all pass; no regressions. * fix(core): tighten streaming sniff bound + Windows rmSync flake Two unrelated reds on the latest CI run: 1. [P1 doudouOUC] Streaming sniff buffer leaks on small chunks. The previous fix recomputed `sniffedBytes` from `Buffer.concat(outputChunks.slice(0, 20)).length` on every chunk — pinned to the first 20 chunks. If those total under MAX_SNIFF_SIZE (line-sized stdout, e.g. dev-server logs) the byte count never grew, the sniff branch stayed open forever, and `outputChunks` accumulated every later chunk — exactly the leak `streamStdout` was meant to prevent. Track sniffed bytes by running sum (`sniffedBytes += data.length`) so the bound is genuine. When sniff confirms text in streaming mode, drop the accumulator immediately so subsequent chunks fall through the streaming emit path without ever touching it. 2. file-exporters.test.ts afterEach `fs.rmSync` flaked on Windows (ENOTEMPTY: directory not empty). The exporter's underlying write stream hasn't always released its handle by the time `rmSync` runs. Pass `maxRetries: 5, retryDelay: 50` so the cleanup retries through the brief Windows handle-release window instead of failing the test on a CI quirk. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
… C) (QwenLM#3684) * feat(core): event monitor tool with throttled stdout streaming (Phase C) Add a new Monitor tool that spawns a long-running shell command and streams its stdout lines back to the agent as event notifications. This is Phase C from the background task management roadmap (QwenLM#3634, QwenLM#3666). What changes: - New MonitorRegistry (services/monitorRegistry.ts): per-monitor entry with lifecycle (running/completed/failed/cancelled), idle timeout auto-stop, max events auto-stop, AbortController-based cancellation. Follows the same structural pattern as BackgroundTaskRegistry. - New Monitor tool (tools/monitor.ts): spawns via child_process.spawn with independent AbortController (Ctrl+C won't kill monitors), separate stdout/stderr line buffers, token-bucket throttling (burst=5, sustain=1/s). Returns immediately with monitor ID; events stream as notifications. - Sleep interception in shell.ts: detectBlockedSleepPattern() blocks foreground `sleep N` (N>=2) and guides model to use Monitor or is_background instead. - Config integration: MonitorRegistry instantiation, accessor, shutdown cleanup (abortAll), lazy tool registration. - CLI wiring: notification callbacks in useGeminiStream.ts (interactive) and nonInteractiveCli.ts (headless), including hold-back loop abort on exit and SIGINT cleanup. What this PR doesn't do (gated on QwenLM#3471/QwenLM#3488): - Footer pill / dialog integration - task_stop / send_message integration Test plan: - 21 MonitorRegistry unit tests (lifecycle, idle timeout, max events, XML escaping, nonexistent ID guard, callback clearing) - 20 Monitor tool unit tests (validation, spawn, line buffering, separate stdout/stderr buffers, throttling, signal-killed path, turn isolation) - 7 detectBlockedSleepPattern unit tests - 2 E2E tests (monitor invocation, sleep interception) - Full core suite: 248 files / 6151 passed Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): hold-back loop waits for monitors + emit task_started for SDK Two fixes from Codex review: P1: The non-interactive hold-back loop now includes monitorRegistry.getRunning() in its wait condition, so monitors can stream events before the CLI exits. Previously monitors were aborted immediately after the agent's first reply. P2: MonitorRegistry gains setRegisterCallback(), and nonInteractiveCli wires it to emit task_started system messages. Stream-json/SDK consumers now see a task_started for each monitor, matching the backgroundTaskRegistry contract. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): Windows process kill + pipeline sleep false-positive Two fixes from Codex review: P1: Monitor abort handler now uses `taskkill /f /t` on Windows instead of POSIX-only `process.kill(-pid)`. Follows the existing pattern in ShellExecutionService.childProcessFallback. P2: detectBlockedSleepPattern no longer uses splitCommands (which splits on `|` pipes). Replaced with a regex that only matches sleep followed by sequential separators (&&, ||, ;, &, newline), not pipes. `sleep 5 | cat` is now correctly allowed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): resolve TS errors in monitor.test.ts mock types Use Object.defineProperty for readonly ChildProcess.pid and proper Readable type for stdout/stderr mocks to satisfy strict tsc builds. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): remove false notification promise + add early-abort guard P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from QwenLM#3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to useGeminiStream tests The useGeminiStream hook now calls config.getMonitorRegistry() to wire up monitor notification callbacks. The test mock config was missing this method, causing 64 test failures with "config.getMonitorRegistry is not a function". Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to nonInteractiveCli tests Same fix as useGeminiStream.test.tsx — the mock config needs getMonitorRegistry to avoid "is not a function" errors (29 failures). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review — CORE_TOOLS, directory param, test fix 1. Add 'monitor' to PermissionManager.CORE_TOOLS so coreTools allowlist correctly gates the monitor tool (same as run_shell_command). 2. Add optional 'directory' parameter to MonitorTool with workspace validation, mirroring ShellTool's directory support for multi-root workspaces. 3. Fix sleep-interception E2E test: readToolLogs() doesn't expose toolResult, so the old assertion was dead code. Now verifies via the model's output text instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address MonitorTool review #4186888042 Addresses three [Critical] review comments on packages/core/src/tools/monitor.ts: 1. Partial-line buffer unbounded growth (processLines) MAX_LINE_LENGTH was only enforced after a newline, so a command emitting a long stream without newlines would grow buffer.value without bound and re-split the entire accumulated string on every chunk. Now, when the buffer has no newline and exceeds MAX_LINE_LENGTH, we force-emit a single truncated event through the throttled path and reset the buffer. 2. Missing type guard on params.command validateToolParamValues called params.command.trim() without a typeof check. Schema validation normally catches this, but SDK/direct callers could bypass it and hit an uncaught TypeError. Added typeof === 'string' guard, matching the pattern used for max_events / idle_timeout_ms. 3. Workspace check bypass via raw startsWith The directory validator used workspaceDirs.some(d => params.directory .startsWith(d)), which allowed prefix collisions (e.g. /tmp/project-evil against a /tmp/project workspace) and skipped canonicalisation / symlink resolution. Switched to WorkspaceContext.isPathWithinWorkspace, which already does fullyResolvedPath + segment-aware isPathWithinRoot matching and is the standard used elsewhere in the codebase. Test coverage: added 6 unit tests covering non-string command guard, non-absolute directory rejection, prefix-collision rejection, traversal rejection, workspace acceptance, and partial-line cap behaviour (including buffer reset). All 26 monitor.test.ts cases pass. The same startsWith pattern also exists in ShellTool and is tracked as a separate follow-up to keep this PR focused on Phase C scope. * fix(core): scope monitor always-allow permissions Populate Monitor confirmation permissionRules using the same command-rule extraction path as ShellTool, so ProceedAlways persists command-scoped Bash(...) rules instead of a broad monitor-level allow. Also add unit coverage for command-scoped rules, filtering already-allowed subcommands, and extractor fallback behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): decouple monitor permission scope from Bash rules Remove pm.isCommandAllowed() from MonitorToolInvocation.getConfirmationDetails() to prevent existing Bash(...) allow rules from shrinking the monitor confirmation scope. Monitor is a long-running background process with a different risk profile than one-shot shell execution and should maintain its own permission boundary. Only AST-based read-only filtering is retained. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): unify monitor error/exit cleanup to prevent resource leaks Extract a shared cleanup() helper called from both the `exit` and `error` event handlers. Previously the `error` handler did not flush buffers, clear buffer values, remove the abort listener, or log dropped-line stats — causing potential memory leaks when `error` fires without a subsequent `exit` (e.g. ENOENT for missing commands). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): add user-skills-directory guard to monitor directory validation Mirror ShellTool's getUserSkillsDirs() check in MonitorTool's validateToolParamValues() to prevent monitor commands from running inside user skills directories. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(core): add Monitor(...) permission namespace for monitor tool (QwenLM#3726) Introduce a dedicated Monitor(...) permission namespace so monitor and shell tools have independent permission boundaries. Previously monitor emitted Bash(...) rules, causing "Always Allow" to fail for future monitor invocations while unintentionally granting run_shell_command. Changes: - rule-parser.ts: add Monitor alias, SHELL_TOOL_NAMES entry, CANONICAL_TO_RULE_DISPLAY, DISPLAY_NAME_TO_VERB - permission-manager.ts: extract SHELL_LIKE_TOOLS set so evaluate(), evaluateSingle(), hasRelevantRules(), hasMatchingAskRule() handle both run_shell_command and monitor - monitor.ts: emit Monitor(...) instead of Bash(...) in permissionRules - Tests: parseRule, matchesRule, cross-tool isolation regression, buildPermissionRules, buildHumanReadableRuleLabel for Monitor Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): decouple headless monitor lifetime from final result Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): stabilize stream-json monitor session shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): deny monitor in headless approval defaults Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): honor tool aliases in headless allow checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address opus review — sleep regex, monitor cap, non-interactive cleanup - Fix sleep interception false positive for backgrounded sleep (`sleep 5 & echo done`). Remove bare `&` from separator character class so the background operator is not treated as a sequential separator. - Add MAX_CONCURRENT_MONITORS (16) check in MonitorRegistry.register() and early rejection in MonitorTool.execute() to prevent unbounded process spawning. - Widen monitorId from 8 to 16 hex chars to reduce birthday collision risk. - Abort all running monitors in nonInteractiveCli.ts success-path finally so piped stdio refs don't keep the Node event loop alive after result emission in one-shot (--print) mode. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors and background shells on /clear Without this, long-running monitors from a previous session survive /clear and continue pushing events into the new session's notification queue. This enables cross-session prompt injection where a malicious monitor persists across the user's escape hatch. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors on stream-json session shutdown Call monitorRegistry.abortAll() in both shutdown() and drainAndShutdown() so detached monitor child processes don't survive session termination. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(cli): use content event type in stream tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): isolate session cleanup on clear and shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): finalize session cleanup after drain Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close remaining monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve shell cwd in virtual permission checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize trailing background ampersands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor permission and wrapper handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): make monitor CI assertions cross-platform Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor wrapper normalization Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize wrapped monitor commands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor headless edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve monitor spawn errors Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor register cleanup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): parse monitor wrapper script token Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review comments for monitor tool - Make Bash(...) permission rules cover monitor via toolMatchesRuleToolName, so deny rules like Bash(rm *) also block monitor({command: "rm ..."}) - Remove dead `normalizeRuleToolName` mock reference in config.test.ts - Fix tool description to mention stdout/stderr instead of just stdout - Export MAX_CONCURRENT_MONITORS from monitorRegistry and use it in monitor.ts instead of hardcoded 16 - Rename ambiguous MAX_LINE_LENGTH constants: PARTIAL_LINE_BUFFER_CAP (4096, monitor.ts) and EVENT_LINE_TRUNCATE (2000, monitorRegistry.ts) - Fix schema description text: "Max 80 characters" → "Truncated to 80 characters in display" - Add .unref() to SIGTERM→SIGKILL escalation timer to prevent 200ms exit delay Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): resolve clear command typecheck issues Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve background tasks across shutdown abort Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address latest monitor review comments Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): handle monitors across session switches Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): cover aborted monitor startup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address remaining monitor PR review comments Adopts four unresolved review threads on PR QwenLM#3684: * shell: trim top-level trailing comments before validating sleep separator so 'sleep 5 # wait' no longer bypasses detectBlockedSleepPattern. * monitor: add sanitizeMonitorLine to strip C0/C1 control chars (except tab) and defang structural envelope tag names with U+200B before forwarding output to the model, blocking prompt-injection attempts hidden in monitored stdout/stderr. * monitor: declare line buffers and throttledEmit before abortHandler to avoid TDZ on synchronous abort paths, and add flushPartialLineBuffers called from both abortHandler (before kill) and cleanup (natural exit/error) so partial-line data is no longer silently dropped on cancel. * permissions: document that normalizePermissionContext relies on buildPermissionCheckContext to forward monitor's directory as cwd, and add regression tests proving relative-path Read(./...) allow and deny rules resolve against the monitor's explicit cwd. * fix(core): abort running monitors in MonitorRegistry.reset() reset() previously only cleared idle timers and emptied the map without aborting running monitors' AbortControllers. This could orphan child processes when reset() was called without a prior abortAll(), e.g. via useResumeCommand → resetBackgroundStateForSessionSwitch. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core): harden monitor notification XML and displayText - Extend escapeXml to escape " and ' as defense-in-depth: safe to reuse the helper in any future XML attribute context without re-auditing. - Strip C0 (except tab) and C1 control characters from the displayText surface before interpolation, so untrusted child-process output cannot leak ANSI escapes / NUL bytes into the operator's terminal even if a direct caller of MonitorRegistry.emitEvent skips sanitization. Adds unit tests for both hardening paths. * test(core): cover token-bucket throttling and commented-sleep bypass - Add 4 unit tests for the monitor token-bucket throttle (burst=5, 1 token/sec refill): burst cap, refill release, long-idle bucket cap, and whitespace lines not consuming budget. Uses vi.setSystemTime to exercise Date.now() without advancing pending setTimeouts. - Add an E2E case that feeds 'sleep 5 # wait for db' through the shell tool to lock in trimTrailingShellComment behavior end-to-end; the unit-level coverage in shell.test.ts remains authoritative but the E2E anchor prevents a regression from silently passing unit tests. * fix(core): address 3 remaining copilot review comments 1. shell.ts sleep interception: strip shell wrapper before detecting the blocked sleep pattern so `bash -c 'sleep 5'` / `sh -c ...` cannot route around the block. Mirrors every other sensitive check in shell.ts, which already normalizes through stripShellWrapper. 2. monitorRegistry.ts emitEvent auto-stop: settle the entry BEFORE aborting its controller so that any synchronous abort listener that flushes buffered output back through registry.emitEvent() (e.g. the Monitor tool's flushPartialLineBuffers) finds status !== 'running' and short-circuits instead of overshooting maxEvents and emitting a duplicate 'Max events reached' terminal notification. 3. monitorRegistry.ts truncateDescription: cap output at exactly MAX_DESCRIPTION_LENGTH by counting the ellipsis against the budget, instead of returning MAX_DESCRIPTION_LENGTH + 3 characters. Each fix is covered by a new unit test. * fix(core): address review comments — sanitize, notify, kill logging, throttle observability - Remove double normalize in buildPermissionCheckContext (PM is single source) - Add {notify:false} to Config.shutdown() and abortTaskRegistries() abortAll - Swap settle-before-abort in cancel() and resetIdleTimer() to prevent races - Add stripDisplayControlChars to emitTerminalNotification - Sanitize monitor description at entry creation via sanitizeMonitorLine - Surface throttle-dropped line count in terminal notification - Add .unref() to idle timer to allow clean process exit - Add error handler + stdio:ignore to Windows taskkill spawn - Log SIGTERM/SIGKILL kill failures via debugLogger.warn - Attach early child error handler to cover spawn-to-register window - Destroy child stdio on register failure to prevent handle leaks - Improve stripShellWrapper to handle absolute paths, combined flags, env prefix - Improve SHELL_TOOL_NAMES documentation and toolMatchesRuleToolName clarity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): resolve monitor tool typecheck errors - Cast child.stdout/stderr to a minimal { destroy?: () => void } shape so the optional destroy() call compiles and still works with test mocks. - Initialize droppedLines: 0 in MonitorEntry test fixtures that predate the field becoming required. * fix(monitor): add missing stdio option in taskkill test assertions (QwenLM#3784) * fix(core): address monitor review feedback * fix(core): harden monitor command lifecycle --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Motivation
Today qwen-code's
shelltool runs background commands by appending&to the command string. The spawned process is fully detached: there's no registry, no captured output, no way for the agent to query status, no way to terminate it later. For long-running things —npm run dev,pytest, file watchers — this turns the agent into a poor manager: it can fire-and-forget, but it can't observe or steer. The user can't see what's running either.This PR makes background shells managed task pool entries with the same lifecycle, observability, and cancellation contract used by background subagents (#3076).
This is Phase B from the background tasks roadmap (#3634).
Before / After (model-facing)
The model's view of
is_background: truechanges shape:Before
(stdout discarded, no way to query status, kill via raw OS pid)
After
(stdout captured, agent can
Readthe output file,/taskslists all entries)What changes
BackgroundShellRegistry(services/backgroundShellRegistry.ts): per-process entry with status (running/completed/failed/cancelled),AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op).shell.tsis_backgroundpath rewritten asexecuteBackground:&, no pgrep envelope)<projectDir>/tasks/<sessionId>/shell-<id>.outputAbortControllerso a single source of truth governs cancellationcomplete(clean exit) /fail(error) /cancel(aborted)shell.ts: pgrep wrapping,&appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup./tasksslash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message.BackgroundShellEntryshapeshellId/tasksUIcommandcwdpidstatus'running' | 'completed' | 'failed' | 'cancelled'exitCodecompletederrorfailedstartTimeendTimeoutputPathabortControllerDesign decisions
BackgroundShellRegistryrather than reusing feat(cli): background-agent UI — pill, combined dialog, detail view #3488'sBackgroundTaskRegistry— feat(cli): background-agent UI — pill, combined dialog, detail view #3488 is still in review and the rename toBackgroundTaskEntry+kinddiscriminator (per the feat(cli): background-agent UI — pill, combined dialog, detail view #3488 review thread) hasn't landed. Building on a moving target risks rework. The new registry mirrors the lifecycle shape of feat(cli): background-agent UI — pill, combined dialog, detail view #3488's (one-shot terminal, AbortController, sidecar output path), so unification is a future rename + base-class extraction, not a structural rewrite.task_notificationpush from this PR — the notification framework lives in the subagent stack (feat: background subagents with headless and SDK support #3076 + feat(core): model-facing agent control (task_stop, send_message, per-agent transcript) #3471). Background shells settle silently in this PR; agent reads status via/tasksor by reading the output file. A follow-up PR (after feat(core): model-facing agent control (task_stop, send_message, per-agent transcript) #3471 lands) will wire shells into the same notification path.fs.mkdirSyncinstead offs.promises.mkdirinexecuteBackground—vi.mock('fs')does not recursively mock thefs.promisessub-namespace, so async API was hard to mock. The sync call happens once per spawn, before the process starts; not a hot path.What this PR doesn't do
task_stop/send_messageintegration — gated on feat(core): model-facing agent control (task_stop, send_message, per-agent transcript) #3471 landingMigration / compatibility notes
is_background: truereturn string changes shape (see Before / After above). Agents / SDK consumers parsing the old "PIDs: ..." line will need to switch to the new id + output-path format. Previous behavior was already loosely structured prose; this PR formalizes it./tasksis a new built-in command. Confirmed no name collision with existing built-in commands.<projectDir>/tasks/<sessionId>/shell-<id>.output. The<projectDir>/tasks/namespace anticipates the path-rename suggested in the feat(core): model-facing agent control (task_stop, send_message, per-agent transcript) #3471 review thread; if that lands differently, this PR's path can be adjusted in the same follow-up.Test plan
complete/fail/canceltransitions, idempotent terminal, no-op on unknown id)shell.test.ts— spawn-no-wrap (verifies command is not wrapped with&or pgrep) + 3 settle paths (complete/fail/cancel)/taskscommand tests — empty registry + populated with running / completed / failed entriesRelated
task_stopintegration)