feat(serve): safe workspace file read routes (#4175 PR 19)#4269
Conversation
Closes PR #4250 follow-up #4. Hashing the per-call cwd for glob audit produced a different pathHash for every subdirectory glob without giving operators any actionable difference (raw paths are privacy-gated). Replace the hash basis with the bound workspace itself and surface the literal pattern on a new schema field, so every glob row carries a stable workspace marker and a per-call pattern. The pattern field also fires on parse_error denials (path-escape patterns, non-relative patterns) so audit consumers debugging a production glob rejection can see the exact rejected pattern without needing QWEN_AUDIT_RAW_PATHS=1.
Add four read-only HTTP routes that consume PR 18's per-request WorkspaceFileSystem boundary: - GET /file?path=... text content + meta (encoding/BOM/lineEnding) - GET /list?path=... directory entries (name/kind/ignored) - GET /glob?pattern=... workspace-relative match paths - GET /stat?path=... file/directory metadata The routes share one error envelope (sendFsError) that maps FsError.kind through the boundary's existing DEFAULT_STATUS_BY_KIND table to a typed JSON response. All four 200 responses set Cache-Control: no-store and X-Content-Type-Options: nosniff so a browser-adjacent client cannot cache or sniff source content. Routes are advertised under a single workspace_file_read capability tag — the four endpoints share the same backing boundary and the same failure shape, so per-route tags would force four simultaneous registry entries with no operator-meaningful difference between them. Mutating routes will ship in PR 20 under their own workspace_file_write tag. Trust gate is unchanged: read intents pass on untrusted workspaces per PR 18's policy.ts. Auth follows the global bearer flow only; read routes never run mutate(), since none of them mutate state.
Closes PR #4250 follow-up #2. runQwenServe now constructs a WorkspaceFileSystem factory from the bound workspace, threads its emit hook through to the read routes, and exposes the trust snapshot via deps.trustedWorkspace. Test additions pin the wiring contract: - audit events emitted on success / denial flow back through the test-supplied fsAuditEmit hook - deps.fsFactory override is honored (built-in default does not silently shadow injection) - trust snapshot defaults to true (operator-chosen workspace) - trust=false routes through to the boundary and trips untrusted_workspace on write intents Default emit stays a stderr warning so a wiring regression that drops events remains visible. PR 21's SSE fan-out will replace the default with a workspace-scoped event channel.
📋 Review SummaryThis PR (#4269) implements four read-only HTTP routes ( 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds a new read-only “workspace file read” surface to the qwen serve daemon by registering four GET routes backed by the existing WorkspaceFileSystem boundary, and wires runQwenServe to inject an fsFactory (with trust snapshot) so auditing and boundary policy flow through consistently.
Changes:
- Register new read-only routes:
GET /file,/list,/glob,/stat, with shared security headers and a unifiedsendFsErrorenvelope. - Inject
fsFactory(and bound workspace) viarunQwenServe → createServeApp, and advertise the newworkspace_file_readcapability. - Extend fs audit payloads to include
patternfor glob events and hash the workspace consistently for globpathHash; add targeted tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/cli/src/serve/server.ts | Registers the new workspace file read routes and exposes boundWorkspace on app.locals. |
| packages/cli/src/serve/server.test.ts | Updates capabilities expectations and adds runQwenServe wiring/trust/fsFactory override tests. |
| packages/cli/src/serve/runQwenServe.ts | Constructs/injects fsFactory with a trust snapshot and optional audit emit hook. |
| packages/cli/src/serve/routes/workspaceFileRead.ts | Implements `GET /file |
| packages/cli/src/serve/routes/workspaceFileRead.test.ts | Adds route-level tests covering success/error cases, headers, and auditing behavior. |
| packages/cli/src/serve/fs/workspaceFileSystem.ts | Enhances glob auditing (workspace-hashed pathHash, literal pattern in access/denied). |
| packages/cli/src/serve/fs/workspaceFileSystem.test.ts | Adds tests asserting glob audit pattern and workspace-hashed pathHash. |
| packages/cli/src/serve/fs/audit.ts | Extends audit payload schemas and publisher to include optional pattern. |
| packages/cli/src/serve/fs/audit.test.ts | Adds tests verifying pattern is emitted/omitted appropriately for access/denied. |
| packages/cli/src/serve/capabilities.ts | Adds always-on workspace_file_read capability tag. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Closes 8 findings from Copilot inline review + Codex review on PR #4269 (5 P0, 3 P1): P0 (correctness / privacy / operations) - runQwenServe.ts: throttle the default fsAuditEmit by reusing the exported `createDefaultFsAuditEmit` from server.ts. The earlier per-event `writeStderrLine` would print one line for every /file/list/glob/stat audit event under normal traffic. Now warns once + every 100th drop with payload context, so a wiring regression is still visible without flooding logs. (Copilot runQwenServe.ts:316; Codex runQwenServe.ts:305) - routes/workspaceFileRead.ts: probe glob with maxResults+1 and trim, so `truncated` reflects whether the boundary actually had more matches. Earlier `length === maxResults` heuristic false-positived when the workspace happened to hold exactly N matches. (Copilot workspaceFileRead.ts:399) - routes/workspaceFileRead.ts: glob `relMatches` now flows through the shared `workspaceRelative` helper. Root match (`pattern=.`) renders as "." rather than the empty string `path.relative` returns; helper also covers the boundWorkspace-undefined edge case so the route no longer carries its own fallback branch. (Copilot workspaceFileRead.ts:388; review summary HIGH-1) - fs/audit.ts: `pattern` field now rides on the same privacy gate as `relPath` / `message`. Glob patterns commonly carry workspace-relative or absolute path fragments (`src/secrets/*.env`, rejected `/Users/alice/ws/**`), so emitting them in privacy mode bypassed the same redaction the other path-bearing fields honor. Operators wanting full forensic context opt in via QWEN_AUDIT_RAW_PATHS=1. (Codex audit.ts:249) - routes/workspaceFileRead.ts: cwd resolves with intent='list' rather than 'glob'. The orchestrator's `recordAndWrap` auto-derives `data.pattern` from `intent === 'glob'`, which turned cwd-resolution failures into rows where the cwd string masqueraded as the glob pattern (`?cwd=../outside` → `pattern: ../outside` in audit). Switching to 'list' is the correct semantic shape (cwd is a directory we intend to walk) with identical trust + path-resolution behavior. (Codex workspaceFileSystem.ts:941) P1 (cosmetic / comment accuracy) - server.test.ts: `honors deps.fsFactory override` test comment rewritten to match the actual failure mode (a regression would 404 on a.txt, not 200 against package.json). (Copilot server.test.ts:3219) - routes/workspaceFileRead.ts: `limit` error message uses the MAX_LIST_ENTRIES constant instead of the literal 2000. (review summary MEDIUM) - fs/audit.ts: expanded the JSDoc explaining why the AuditPublisher request types Omit four fields and pass `pattern` through. (review summary MEDIUM) Test additions / adjustments - audit.test.ts: split the existing pattern tests into raw-paths and privacy-default cases; added two new privacy-mode assertions that strip pattern under default config. - workspaceFileSystem.test.ts: harness accepts `includeRawPaths`; glob audit suite runs with raw paths to observe `pattern`; new `glob audit privacy default` suite asserts pattern + relPath are stripped without the env opt-in. - workspaceFileRead.test.ts: new GET /glob cases for the truncated edge case (count == maxResults → false; count > maxResults → true) and root-match normalization. Not adopted (with rationale) - review summary HIGH-2 (glob pathHash uses boundWorkspace): this is the deliberate follow-up #4 contract from PR 18; pattern is the per-call signal, pathHash is the workspace marker. - review summary MEDIUM-1 (parseIntInRange three-state return): matches `parseMaxQueuedQuery` in server.ts; consistency wins. - review summary LOW-1/2/3 (capabilities comment length, CSP header, reverse truncated:false assertion): rationale already documented in code, CSP belongs in a hardening PR, the reverse assertion already exists. 518/518 serve tests pass; typecheck + eslint clean within src/serve/.
wenshao
left a comment
There was a problem hiding this comment.
Round 3 review — new findings only. Round-1/2 已 raise 的项(_deps、MAX_LIST_ENTRIES 复用、err.stack PII、workspaceRelative silent fallback、/file ?line/?limit 测试、/glob ?includeIgnored/?cwd 测试、/stat path_not_found 测试)不重复。
本轮重点:response shape 一致性 + 测试加固 + audit 文档挂载位置。
— Claude Opus 4.7 (1M context) via Claude Code /review
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
本地实测报告(维护者 review)在 自动化检查
四个路由 happy path
响应头 安全边界(全部正确拒绝)
审计 + 关闭
小观察(不阻塞)非 GET 方法( 代码质量、测试覆盖、安全设计、审计串联、优雅关闭均达标,LGTM,可以合并。 |
wenshao
left a comment
There was a problem hiding this comment.
本地实测通过(525/525 tests + 安全边界全部正确拒绝 + 审计串联确认),详见上面的测试报告。
wenshao
left a comment
There was a problem hiding this comment.
Qwen Code Review — PR #4269
Verdict: Approve with suggestions
The implementation is solid — good boundary enforcement via WorkspaceFileSystem, comprehensive audit logging, and thorough test coverage. The design cleanly separates read-only routes from the write path, and the boundary check (workspace root canonicalization + relative-path escape detection) is correct.
No critical or high-confidence issues found. The following are low-confidence suggestions and nice-to-haves:
Suggestion (low confidence): Glob per-hit sequential realpath + lstat
File: workspaceFileSystem.ts — globAsync post-filter loop (lines 636, 676)
The glob result loop awaits fsp.realpath(absolute) and then fsp.lstat(canonical) one-by-one for every match. With large **/* patterns this is O(2n × syscall_latency).
Options to consider (future follow-up, not blocking):
- Batch realpaths with
Promise.allin chunks (e.g., 64 at a time) to exploit kernel I/O parallelism. - Use
fs.realpath.native(libuv threadpool) instead of the JS polyfill for lower overhead. - Expose a
maxResultscap in the route contract to bound worst-case latency.
Suggestion (low confidence): Glob ignore list is hardcoded
File: workspaceFileSystem.ts — globAsync call (lines 610–617)
The walk-time ignore list is ['**/node_modules/**', '**/.git/**'] — a static two-entry list. The PR author's comment acknowledges this is intentional ("post-filter via shouldIgnore is still authoritative — this is purely a walk-time optimization"). However, common workspace directories like dist/, build/, coverage/, and .cache/ will still be walked by glob and only filtered post-hoc. For large monorepos, forwarding the workspace's ignore rules to glob's ignore would reduce unnecessary I/O.
Nice to have: MAX_FILE_LINE_LIMIT mismatch
The route uses Math.min(maxLines, 10000) while WorkspaceFileSystem.readText enforces maxLines <= MAX_FILE_LINE_LIMIT (default 10000). If MAX_FILE_LINE_LIMIT is ever changed or made configurable, the two caps could diverge. Consider importing MAX_FILE_LINE_LIMIT from workspaceFileSystem.ts and using Math.min(maxLines, MAX_FILE_LINE_LIMIT) in the route for a single source of truth.
Nice to have: Route coverage
The PR adds GET /file, /list, /glob, /stat — all read-only. The old /read_resource MCP tool covered listDirectory + readText but not glob or stat. The new routes are a superset. Good coverage.
Summary: Well-scoped PR with clean boundary enforcement and comprehensive tests. No blocking issues. The suggestions above are optimizations for large-workspace edge cases.
PRs #4249 (workspace memory + agents CRUD) and #4269 (workspace file read routes) added `workspace_memory`, `workspace_agents`, and `workspace_file_read` to `SERVE_CAPABILITY_REGISTRY` and updated the unit-level `EXPECTED_STAGE1_FEATURES` in `packages/cli/src/serve/server.test.ts`, but missed the matching integration-test expectation. The E2E `qwen serve — capabilities envelope > advertises all baseline capabilities` assertion has been failing on `main` since those PRs landed. Append the three tags in the same positions as `SERVE_CAPABILITY_REGISTRY` and the unit-level constant (`workspace_memory` + `workspace_agents` after `workspace_providers`, `workspace_file_read` after `mcp_guardrails`). No production code changes — same shape as #4268.
Summary
GET /file|/list|/glob|/stat) consuming PR 18'sWorkspaceFileSystemboundary, advertised under a singleworkspace_file_readcapability tag.runQwenServeto construct + injectfsFactoryso the bridge's audit hooks plumb through to PR 19 routes; trust snapshot defaults totrue(operator-chosen workspace).boundWorkspaceand emits a literalpatternfield on both success and parse-rejection paths.Closes the in-flight items from #4175 Wave 4:
runQwenServe → fsFactoryinjection contract test)pathHash+patternfield)Routes
GET /file?path=...?maxBytessoft-truncates withtruncated:true;?line+?limitwindowGET /list?path=...{name, kind, ignored}entries (no synthesized paths); 2000-entry cap withtruncated:true;?includeIgnored=1opt-inGET /glob?pattern=...?cwd+?maxResults(default 5000, max 50000); rejects..segments + non-relative patternsGET /stat?path=...{type, sizeBytes, modifiedMs}All four set
Cache-Control: no-store+X-Content-Type-Options: nosniffand route errors through a sharedsendFsErrorenvelope ({errorKind, error, hint?, status}) usingFsError.statusfrom PR 18'sDEFAULT_STATUS_BY_KIND.Trust gate is unchanged: read intents pass on untrusted workspaces. Auth follows the global bearer flow only; read routes never run
mutate().Out of scope (PR 20+)
POST /file/write+POST /file/edit(PR 20, with atomic-via-temp +expectedHash)GET /file/bytesfor binary content (PR 20)fs.access/fs.deniedto clients (PR 21)Test plan
pnpm vitest src/serve— 511/511 pass (14 files, includes newroutes/workspaceFileRead.test.tswith 19 cases)pnpm vitest src/serve/httpAcpBridge— 133/133 pass (regression check)pnpm tsc --noEmit -p packages/cli/tsconfig.json— clean withinsrc/serve/pnpm eslint packages/cli/src/serve— cleanqwen serve --workspace . --port 0then curl all four routes (reviewer-runnable)🤖 Generated with Qwen Code