feat(core): event monitor tool with throttled stdout streaming (Phase C)#3684
Conversation
Add a new Monitor tool that spawns a long-running shell command and streams its stdout lines back to the agent as event notifications. This is Phase C from the background task management roadmap (#3634, #3666). What changes: - New MonitorRegistry (services/monitorRegistry.ts): per-monitor entry with lifecycle (running/completed/failed/cancelled), idle timeout auto-stop, max events auto-stop, AbortController-based cancellation. Follows the same structural pattern as BackgroundTaskRegistry. - New Monitor tool (tools/monitor.ts): spawns via child_process.spawn with independent AbortController (Ctrl+C won't kill monitors), separate stdout/stderr line buffers, token-bucket throttling (burst=5, sustain=1/s). Returns immediately with monitor ID; events stream as notifications. - Sleep interception in shell.ts: detectBlockedSleepPattern() blocks foreground `sleep N` (N>=2) and guides model to use Monitor or is_background instead. - Config integration: MonitorRegistry instantiation, accessor, shutdown cleanup (abortAll), lazy tool registration. - CLI wiring: notification callbacks in useGeminiStream.ts (interactive) and nonInteractiveCli.ts (headless), including hold-back loop abort on exit and SIGINT cleanup. What this PR doesn't do (gated on #3471/#3488): - Footer pill / dialog integration - task_stop / send_message integration Test plan: - 21 MonitorRegistry unit tests (lifecycle, idle timeout, max events, XML escaping, nonexistent ID guard, callback clearing) - 20 Monitor tool unit tests (validation, spawn, line buffering, separate stdout/stderr buffers, throttling, signal-killed path, turn isolation) - 7 detectBlockedSleepPattern unit tests - 2 E2E tests (monitor invocation, sleep interception) - Full core suite: 248 files / 6151 passed Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Two fixes from Codex review: P1: The non-interactive hold-back loop now includes monitorRegistry.getRunning() in its wait condition, so monitors can stream events before the CLI exits. Previously monitors were aborted immediately after the agent's first reply. P2: MonitorRegistry gains setRegisterCallback(), and nonInteractiveCli wires it to emit task_started system messages. Stream-json/SDK consumers now see a task_started for each monitor, matching the backgroundTaskRegistry contract. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Two fixes from Codex review: P1: Monitor abort handler now uses `taskkill /f /t` on Windows instead of POSIX-only `process.kill(-pid)`. Follows the existing pattern in ShellExecutionService.childProcessFallback. P2: detectBlockedSleepPattern no longer uses splitCommands (which splits on `|` pipes). Replaced with a regex that only matches sleep followed by sequential separators (&&, ||, ;, &, newline), not pipes. `sleep 5 | cat` is now correctly allowed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…alizedTasks) Merge main into phase-c-monitor-tool branch. Conflicts resolved: - tool-names.ts: keep both TASK_STOP/SEND_MESSAGE (from main) and MONITOR - index.ts: keep both toolUseSummary (from main) and monitorRegistry exports - nonInteractiveCli.ts: adopt main's hasUnfinalizedTasks() API and add monitorRegistry.getRunning() check alongside it Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Use Object.defineProperty for readonly ChildProcess.pid and proper Readable type for stdout/stderr mocks to satisfy strict tsc builds. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Conflicts resolved: - config.ts: keep both MonitorRegistry and BackgroundShellRegistry (import, instantiation, shutdown abortAll, accessor) - index.ts: keep both monitorRegistry and backgroundShellRegistry exports Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from #3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The useGeminiStream hook now calls config.getMonitorRegistry() to wire up monitor notification callbacks. The test mock config was missing this method, causing 64 test failures with "config.getMonitorRegistry is not a function". Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Same fix as useGeminiStream.test.tsx — the mock config needs getMonitorRegistry to avoid "is not a function" errors (29 failures). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
1. Add 'monitor' to PermissionManager.CORE_TOOLS so coreTools allowlist correctly gates the monitor tool (same as run_shell_command). 2. Add optional 'directory' parameter to MonitorTool with workspace validation, mirroring ShellTool's directory support for multi-root workspaces. 3. Fix sleep-interception E2E test: readToolLogs() doesn't expose toolResult, so the old assertion was dead code. Now verifies via the model's output text instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Addresses three [Critical] review comments on packages/core/src/tools/monitor.ts: 1. Partial-line buffer unbounded growth (processLines) MAX_LINE_LENGTH was only enforced after a newline, so a command emitting a long stream without newlines would grow buffer.value without bound and re-split the entire accumulated string on every chunk. Now, when the buffer has no newline and exceeds MAX_LINE_LENGTH, we force-emit a single truncated event through the throttled path and reset the buffer. 2. Missing type guard on params.command validateToolParamValues called params.command.trim() without a typeof check. Schema validation normally catches this, but SDK/direct callers could bypass it and hit an uncaught TypeError. Added typeof === 'string' guard, matching the pattern used for max_events / idle_timeout_ms. 3. Workspace check bypass via raw startsWith The directory validator used workspaceDirs.some(d => params.directory .startsWith(d)), which allowed prefix collisions (e.g. /tmp/project-evil against a /tmp/project workspace) and skipped canonicalisation / symlink resolution. Switched to WorkspaceContext.isPathWithinWorkspace, which already does fullyResolvedPath + segment-aware isPathWithinRoot matching and is the standard used elsewhere in the codebase. Test coverage: added 6 unit tests covering non-string command guard, non-absolute directory rejection, prefix-collision rejection, traversal rejection, workspace acceptance, and partial-line cap behaviour (including buffer reset). All 26 monitor.test.ts cases pass. The same startsWith pattern also exists in ShellTool and is tracked as a separate follow-up to keep this PR focused on Phase C scope.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
The previously identified exported type rename issue no longer maps to the current PR diff, so this review only includes the remaining high-confidence blocker.
Populate Monitor confirmation permissionRules using the same command-rule extraction path as ShellTool, so ProceedAlways persists command-scoped Bash(...) rules instead of a broad monitor-level allow. Also add unit coverage for command-scoped rules, filtering already-allowed subcommands, and extractor fallback behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Remove pm.isCommandAllowed() from MonitorToolInvocation.getConfirmationDetails() to prevent existing Bash(...) allow rules from shrinking the monitor confirmation scope. Monitor is a long-running background process with a different risk profile than one-shot shell execution and should maintain its own permission boundary. Only AST-based read-only filtering is retained. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
Review: feat(core): event monitor tool with throttled stdout streaming (Phase C)
Ran 9 parallel review agents + verification + reverse audit on this iteration (diff since prior review commit 3bc8622e). All prior review findings have been addressed — no new high-confidence issues found.
Deterministic analysis: ESLint: 0 findings. tsc: 4 pre-existing type errors in packages/cli/src/config/config.test.ts (mcpServers possibly undefined) — not introduced by this PR.
Build & tests: All pass (910 tests, 0 failures).
Low-confidence items for human review (from LLM agents, verified but below the threshold for inline comments):
shell.tsdirectory validation still usesstartsWith()while Monitor usesisPathWithinWorkspace()— the PR adds newcwdpropagation to PM evaluation that extends the weaker check's reachmonitor.tsexit handler'scleanup()may settle via maxEvents before capturing the real exit code (low probability race)- Duplicated
escapeXml(5-char vs 3-char escaping), control-char stripping, and background-work helpers across files — should be unified before the planned#3488refactor tool-utils.tsSHELL_TOOL_NAMESmissing'monitor'/'MonitorTool'aliases — out of sync withrule-parser.tsandpermission-manager.tsbackground-tasks.tsabortAll({ notify: false })setsnotified=trueon all entries including already-completed ones, narrowing the notification windowpermissions/types.tscommandfield docstring only mentionsBash— now also applies to Monitor
All items above are tracked or deferred to follow-up PRs per prior review discussions.
— glm-5.1 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
Build & Test: ✅ All CI checks pass (11/11). Build passes. CLI tests: 4785 passed. Core tests: 6503 passed (2 pre-existing failures in skill-manager.test.ts, unrelated).
Critical
- [typecheck]
packages/cli/src/config/config.test.ts: TypeScript type errors at lines 1601, 1606 ('mcpServers' is possibly 'undefined', TS18048) and 1619, 1633 (Object is possibly 'undefined', TS2532).tsc --noEmitfails on these.
Suggestions (posted as inline comments)
monitor.ts:449—child.removeAllListeners()strips error handler in catch blockmonitorRegistry.ts:185—cancel()settle-before-abort loses partial-line outputclearCommand.ts:31— Unnecessary unsafe type-cast for.reset()callsclearCommand.ts:19— Duplicated helper functions (also inuseResumeCommand.ts)shell.ts:82—getSleepSequentialSeparatormissing&separatorshell.ts:144— Duplicate shell-quoting state machine
Verified & Rejected
7 findings rejected after cross-reference verification — including sleep detection bypass (intentional design), Bash→Monitor rule inheritance (documented + tracked in #3726), persistedStatus default (intentional), and trailing & stripping regression (factually incorrect — old code only stripped in background path).
— deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
4 critical 全部验证修复(partial-line flush 顺序、register-failure async error 吞错、removeAllListeners 拆 absorber、-o pipefail wrapper bypass)。CI 13/13 绿,5 天多轮 review 收敛完毕。延后的若干条建议在 follow-up issue 中跟踪(已在 inline 回复中点名)。LGTM ✅
Follow-up 清单PR 主体已 LGTM 合入,下面这些 finding 在 review 中达成共识 deferred 到后续 issue / PR,集中列在这里方便追踪: Security hardening(defense-in-depth,非 blocker)
Resource budget / DoS 防护
Permission / rule 系统一致性
Race / 状态一致性
DRY / 维护性整理
后续阶段(gated)
如果 review 中漏标的 finding 也属于这些方向,欢迎补充到这个清单或提交独立 issue。 |
本地功能验证切到 PR HEAD( LLM 调用链路本身已经被 测试方法最小 harness( git fetch origin pull/3684/head:pr-3684 && git checkout pr-3684
npm run build
tmux new-session -d -s qwen-monitor -x 200 -y 60
tmux send-keys -t qwen-monitor 'node scripts/test-monitor-tool.mjs' Enter场景 & 结果6 个场景全部通过,实际行为与 PR 设计完全对齐:
实际日志(精简,已去 ANSI)重点观察项(对应代码位置)
测试产物
PR 行为与设计 / 注释 / 单测覆盖一致,本轮验证未发现偏差。 |
Captures current state of the bg-agent subsystem in the fork (what's already wired, what is not), maps the upstream qwen-code PRs we have not yet ported (QwenLM#3076 → QwenLM#3739), and sketches three phases to close the gap: - Phase A: model-facing agent control + event monitor (QwenLM#3471, QwenLM#3684, QwenLM#3687) - Phase B: TUI surface + /tasks command (QwenLM#3488, QwenLM#3642) - Phase C: cross-session resume (QwenLM#3739) Also calls out cross-cutting decisions we should make before Phase A lands: settings layout, stop-tool naming, persistence shape, gateway validation. This is a planning doc, not a spec. Per-phase code-level designs come later. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) * feat(cli): include monitors in /tasks + add interactive-mode hint Phase B closure for Issue #3634. Two coupled changes to /tasks: 1. **Bug fix — include monitors.** The command was last touched before #3684 / #3791 landed, so it merged only agent + shell entries while monitors silently disappeared from the headless / non-interactive / ACP listing path. Add a third registry pull from `getMonitorRegistry()` and wire monitor through statusLabel / taskLabel / taskId / taskOutputPath. Status line includes eventCount (`running (N events)`, `completed (exit 0, N events)`, `completed (Max events reached, N events)` for auto-stop) and pid where defined. 2. **Soft deprecation hint, scoped to interactive mode only.** Once the richer Ctrl+T dialog (#3488 + #3720 + #3791) is available, the text dump is the long-form fallback rather than the primary surface. Show `Tip: Ctrl+T opens the interactive Background tasks dialog with detail view + live updates.` at the top of the output when `executionMode === 'interactive'`. Headless / ACP get the bare list — they have no dialog to point at and the hint would just clutter. Description string also clarified to call out the modal split. Kept on all three executionModes (no deletion) — `/tasks` is the only way headless / ACP / SDK consumers can inspect background-task state. Tests: 4 new cases in tasksCommand.test.ts cover monitor entry formatting (running with pid, natural completion with exitCode, auto-stop with error string, failed), the singular `1 event` form, the interactive-mode hint gating, and the cross-kind merge order. * fix(cli): address PR 3801 review — exhaustive switch + i18n + extra tests Three actionable Suggestions from /review's pass: - `taskLabel` rewritten as a `switch` with a `never`-typed `default` arm, matching the structural-safety pattern already used by `taskId`. Adding a 4th DialogEntry kind in the future will now flip both helpers to compile errors instead of letting `taskLabel` silently fall through to `entry.description` (which the new kind may not have). - Hint string wrapped in `t()` for i18n consistency with the rest of the file. The literal stays as the i18n key default, so today's output is unchanged. - Tests: cover `cancelled` monitor status (was the only one without an inline assertion) and explicit `acp` execution mode hint suppression (pins the suppression rationale so a future regression flipping the check to `!== 'non_interactive'` would fail loudly). * fix(cli): correct /tasks dialog-open hint — Ctrl+T was wrong Tmux verification on PR #3801 caught that the hint string says "Ctrl+T opens the interactive Background tasks dialog" but Ctrl+T is actually bound to the MCP tool descriptions toggle (ContextSummaryDisplay.tsx lines 110-115). The dialog opens via Down arrow on an empty composer (focuses the footer pill) followed by Enter (InputPrompt.tsx 947-968). Same misattribution slipped into PR #3791's first description and was caught + fixed there before merge — this PR carried the wrong wording forward in code. Updates four sites: - The hint string itself: "Tip: press ↓ from an empty composer then Enter to open the interactive Background tasks dialog with detail view + live updates." - The slash-command description: "interactive UI is Ctrl+T" → "interactive dialog opens via the footer pill" - Two inline comments referencing Ctrl+T as the dialog opener - The interactive-mode hint test now pins on `↓` + `Enter` and asserts `not.toContain('Ctrl+T')` so a regression to the wrong wording fails loudly. * fix(cli): address PR 3801 review — exhaustive switch consistency + path-agnostic hint Four Suggestions from the latest /review pass: - `statusLabel` rewritten as a single top-level switch with a `never`-typed default, matching `taskLabel` / `taskId` / `taskOutputPath`. The previous `if`/`if`/fallthrough form would silently apply monitor formatting to a future 4th kind. - `taskOutputPath` gained the same exhaustive default — was the only per-kind helper still relying on implicit fallthrough; would silently omit a 4th-kind output path while the adjacent helpers flip to compile errors. - Hint wording de-specifies the exact keystroke count: `'Tip: focus the Background tasks pill in the footer (use ↓ from an empty composer) and press Enter ...'`. Previous "press ↓ then Enter" phrasing was wrong when the Arena agent tab bar is present — `InputPrompt`'s focus chain routes Down through the tab bar first, so a single Down lands there, not on the bg pill. - Test pin tightened: `[mon_fail] failed: spawn ENOENT (0 events)` is now a full-string assertion instead of a prefix match, so a regression that drops the `(N events)` suffix from monitor's failed branch fails loudly. * fix(cli): sanitize ANSI escape sequences in /tasks output deepseek's review pass flagged that monitor description / error fields are user / process-supplied strings rendered directly to the terminal. A maliciously-crafted tool description or spawn error containing raw ANSI control sequences (clear-screen, cursor-move, colour) would otherwise reach stdout verbatim and corrupt display. Same risk applies to agent error / description and shell error / command — all already-existing renderers with the same exposure that this PR didn't introduce but inherits. So instead of per-field sprinkling, wrap the joined output once with `escapeAnsiCtrlCodes` (no-op when no control chars present, so cost is zero in the common case). One line change in the renderer covers every kind including any future one. Test pins the behaviour: a monitor entry with `\x1b[2J` / `\x1b[31m...` content produces output with no raw ESC bytes and visible escaped `�[...]` sequences. * docs(cli): tighten escapeAnsiCtrlCodes comments to match actual scope Two doc-precision Suggestions from copilot's pass on 0840e32: - Source comment claimed `escapeAnsiCtrlCodes` is "a no-op when no control chars" but it's narrower than that — it only handles sequences matched by `ansi-regex` (CSI / OSC / SGR — anything starting with ESC). Isolated C0/C1 control bytes like BEL, BS, VT pass through untouched. Updated the comment to enumerate the actual scope and call out that `node:util`'s `stripVTControlCharacters` would be needed if those become a concern. - Test comment had a literal raw ESC byte (octal 033) embedded in the source — visually showed `^[[...]` in editors that render ESC, but was a real ESC byte in the file rather than the escaped `�` form the sanitizer produces. Rewrote with a literal `�` text description so what the comment shows matches what the assertions check for. * fix(cli): broaden /tasks sanitization + tighten inner switch exhaustiveness Addresses 3 of 5 items from doudouOUC's PR 3801 review: - **Issue 1 (Low) — C0/C1 control byte gap**: switched from `escapeAnsiCtrlCodes` (only handles ESC-initiated ANSI sequences) to `stripUnsafeCharacters` (one-pass strip of ANSI + VT + C0/C1, with TAB/CR/LF preserved). The pre-existing exposure to bare BEL / BS / FF / VT bytes via shell entry strings is now closed for all three kinds. Test rewritten to cover both ANSI sequences AND bare control bytes (BEL, BS), and pins that surrounding printable text and line breaks survive. - **Issue 2 (Low) — inner status switches inconsistent**: the three inner `switch (entry.status)` blocks (agent / shell / monitor) used `case 'running': default: return 'running'` (or duplicated bodies). All three now have explicit `running` cases plus a `never`-typed default that throws — matches the outer `switch (entry.kind)` pattern and means a future status added to any of `BackgroundTaskEntry` / `BackgroundShellEntry` / `MonitorStatus` flips to a compile error here instead of silently returning `'running'`. - **Issue 5 (Nit) — beforeEach default change**: added an inline comment explaining why the test default overrides `createMockCommandContext`'s `'interactive'` default (`'non_interactive'` lets the hint-suppression assertions work without each test rebinding context). Issues 3 and 4 from the review are nits with no action needed (3 is already documented as intentional; 4 is a UX call about hint length that's better handled by user feedback than guess-tweaking). * fix(cli): bind status to local before exhaustive switch — fixes tsc build CI's `tsc --build` (full mode, vs `--noEmit` locally) caught that `switch (entry.status)` followed by a `never`-typed default reading `entry.status` doesn't compile. After the case arms exhaust the discriminated union, TS narrows `entry` itself to `never`, so the `.status` access in the default arm becomes "Property 'status' does not exist on type 'never'" + the resulting `any` value can't be assigned to `never`. Fix: bind `entry.status` to a local `status` const before the inner switch. The local stays typed as the per-kind status union and narrows correctly to `never` at the default arm — `const _exhaustive: never = status` is then `never = never`, valid. Standard exhaustive-switch-on-discriminator pattern; doesn't change runtime behavior or test surface, just gets past TS narrowing on the nested case.
Summary
sleep N(N>=2), including wrapped/commented forms, and guides the model to use Monitor oris_backgroundinsteadtask_stop, shutdown/session-switch cleanup, and core tool gatingThis is Phase C from the background task management roadmap (#3634), tracked by #3666.
Design
The Monitor tool follows the same structural pattern as the background task infrastructure:
AbortControllerper monitor (Ctrl+C won't kill monitors)<task-notification>XML envelope with<kind>monitor</kind>useGeminiStream.ts) and headless/stream-json (nonInteractiveCli.ts,nonInteractive/session.ts) paths, includingtask_startedandtask_notificationSDK eventsPermissions and safety
Monitor(...)permission namespace for monitor-specific always-allow rulesBash(...)rules relevant to monitor evaluation so existing shell deny/ask rules cannot be bypassed by switching to Monitordirectoryparameter with workspace validation and user-skills-directory guardWhat this PR doesn't do
send_messageintegrationTest plan
Monitor(...),Bash(...)cross-tool matching, cwd-aware virtual file operation checks, and core tool gating/clear,/resume, andtask_stopTmux verification (2026-05-08)
Full tmux transcript retained in the worktree at
.qwen/e2e-tests/pr-3684-tmux-clean-20260508-141554.log. Relevant input/output excerpt:Related
🤖 Generated with Qwen Code