docs(tools): document monitor tool#4356
Conversation
📋 Review SummaryThis PR adds documentation for the built-in 🔍 General Feedback
🎯 Specific Feedback🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…[skip ci] CI dry-run regression on PRs #4356/#4110 surfaced two issues: - ULTRA_LIGHT verdict showed "Changed lines: 0" for a docs-only PR because it reused the size-gate's meaningful (docs/lockfile-filtered) count for display. Add a raw_changed_lines output (additions+deletions) and use it in the UL verdict and preflight context — the meaningful count still drives the size gate only. - DEEP review on #4110 hit the 25m cap mid-consolidation; the partial flush captured only orchestrator narration, not findings, because the bundled multi-agent skill keeps findings in sub-agent transcripts. Raise the cap to 40m (step 45m) and, on any non-clean DEEP exit, post an explicit re-run notice instead of the misleading narration. STANDARD/ LIGHT are single-shot so their partial flush is kept unchanged. Design doc updated to document the DEEP-vs-single-shot distinction.
Qwen Code Review — ULTRA_LIGHTPreflight triage decided this PR does not need a deep review. Rationale: Pure docs — 3 files all under docs/developers/tools/; no code, no runtime, no API, no build, no data, no security surface. All 5 blast-radius dimensions false.
Reply Reviewed by |
wenshao
left a comment
There was a problem hiding this comment.
Security Review — PR #4356 (docs: monitor tool)
Verdict: COMMENT — 3 Suggestions, no Critical/High findings.
The core security boundaries documented here (permission model, workspace restriction, command substitution rejection, orphan process behavior) are accurate and match the source code. The issues below are factual inaccuracies and incomplete security warnings that could mislead users.
Suggestion 1: Line truncation limit is wrong (2000 vs 4096)
File: docs/developers/tools/monitor.md, line 134
Severity: Suggestion
The doc states:
individual lines longer than 2000 characters are truncated
The source code uses PARTIAL_LINE_BUFFER_CAP = 4096 (packages/core/src/tools/monitor.ts:64). Both the partial-line force-flush path and the per-line truncation path use this constant. The value "2000" should be corrected to "4096".
Suggestion 2: Prompt injection warning is too vague
File: docs/developers/tools/monitor.md, lines 137-140
Severity: Suggestion
The current warning reads:
Structural notification tags are defanged, but the model still reads each line's text, so avoid monitoring streams that external parties can write to unless you trust the model to ignore embedded instructions.
This is directionally correct but doesn't explain the threat model clearly enough for users to make informed decisions. Consider expanding to cover:
- Why it matters: Monitor output flows into the agent's context as
<task-notification>content. Even though structural envelope tags are defanged (XML structure spoofing is prevented) and XML is escaped in the registry, the model reads the raw text of each line. - Concrete threat: An attacker who can write to a monitored log or stream (e.g., a shared log file, a public chat channel, a web server access log) could embed natural-language instructions that the model might act on — this is a prompt injection vector, not just an XML spoofing vector.
- What "defanged" does NOT protect against: Tag defanging only prevents spoofing
</task-notification>boundaries. It does not prevent natural-language injection like "Ignore previous instructions and..."
Suggested rewrite:
Structural notification tags are defanged to prevent XML structure spoofing, and XML content is escaped in the registry. However, the model reads each line's raw text, so monitoring streams that untrusted parties can write to is a prompt injection risk. An attacker who controls lines in a monitored log could embed natural-language instructions the model might act on. Avoid monitoring such streams unless you accept this risk.
Suggestion 3: Missing warning about environment variable exposure
File: docs/developers/tools/monitor.md, "Important notes" section
Severity: Suggestion (low priority)
The monitor spawns child processes with env: { ...process.env, ... } (monitor.ts ~line 350), inheriting the full parent environment including API keys, session tokens, and other sensitive credentials. Commands that introspect their environment (e.g., env, printenv, set) or that log environment variables could expose these secrets in the notification stream.
Consider adding a brief note like:
Environment exposure: Monitor commands inherit the full Qwen Code process environment (including API keys and tokens). Avoid monitoring commands that print or log environment variables.
Verified as accurate (no issues)
- Permission model (read-only auto-allow, state-changing ask, command substitution deny) — matches
getDefaultPermission()inmonitor.ts - Workspace restriction with symlink resolution — matches
isPathWithinWorkspace()usingrealpathSync - Independent permission boundary from
run_shell_command— matches the explicit comment atmonitor.ts:204-211 - 16 concurrent monitor cap — matches
MAX_CONCURRENT_MONITORS = 16 - SIGTERM → SIGKILL after ~200ms — matches
monitor.ts:417-430 - Windows
taskkill /f /t— matchesmonitor.ts:411-416 - Background operator (
&) stripping and rejection — matcheshasUnsafeMonitorBackgroundOperator+normalizeMonitorShellCommand - Rate limiting (burst 5, ~1/sec) — matches
THROTTLE_BURST_SIZE = 5andTHROTTLE_REFILL_INTERVAL_MS = 1000
wenshao
left a comment
There was a problem hiding this comment.
Second-opinion review with qwen3.7-max. All technical claims verified against source code — defaults, limits, concurrency, permissions, and signal handling all match. The open round-2 Suggestion (process-exit status mapping in the auto-stop bullet) remains valid and unaddressed. No new findings.
— qwen3.7-max via Qwen Code /review
The monitor tool's auto-stop bullet only mentioned max_events and idle_timeout_ms. It omitted the most common stop trigger -- the underlying command exiting on its own -- and did not explain how settleFromExit maps the process's exit outcome onto monitor status. Operators reading "Monitor failed (3 events)" could reasonably conclude the monitor tool itself broke when the command merely exited with a non-zero code. Expand the bullet to list command-exit as a third auto-stop trigger and spell out the three status mappings: code 0 -> completed, non-zero code -> failed with "Exit code N", signal -> failed with "Killed by signal SIG". This matches the behavior in packages/core/src/tools/monitor.ts (settleFromExit).
Summary
monitortool and linked it from the tools overview and navigation.Validation
monitortool and place it where it best fits in the docs.monitor, how to inspect or stop monitors, and when to use background shell commands instead.200 http://localhost:3000/developers/tools/monitor. The full monorepo build did not complete becausepackages/clicurrently fails TypeScript compilation on existingserve/acp-bridgeexport and module errors unrelated to this docs-only change.Scope / Risk
Testing Matrix
Testing matrix notes:
npm run build && npm run typecheckwas attempted on macOS but stopped at existingpackages/cliTypeScript errors before typecheck ran.npx prettier --check ...passed on macOS.Linked Issues / Bugs