feat(memory): add autoSkill background project skill extraction#3673
Conversation
Skill Nudge E2E 测试报告1. 执行摘要测试结果概览
2. 设计文档检查清单根据设计文档 L258-320 的 8 项需求,测试覆盖情况如下:
覆盖结论: 8/8 设计文档需求已验证 ✅ 3. 详细测试结果Test 1: 低工具调用密度不应触发技能评审目的: 验证
Test 2: 到达或超过阈值应触发技能评审目的: 验证
Test 3: 历史中 skill_manage 函数调用应阻止触发目的: 验证最近已使用 skill_manage 时跳过
Test 4: 配置启用/禁用门控目的: 验证 skillNudge.enabled 配置正确控制行为
Test 5: Extract + Skill Review 合并检测目的: 验证待处理的 extract 任务检测和独立调度
Test 6: 任务记录追踪和元数据目的: 验证任务元数据正确记录和状态转换
Test 7: 阈值边界情形目的: 验证 threshold-1, threshold, threshold+1 的行为差异
边界逻辑验证: 阈值在 |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…d, recordCompletedToolCall, consumePendingMemoryTaskPromises)
- Fix merged_with_extract silent drop: remove broken merge optimization in scheduleSkillReview(). When an extract task is pending/running, skill review is now scheduled independently instead of recording metadata that no production code ever reads. - Fix SKILL_MANAGE blocked from skill review agent: prepareTools() now only enforces the recursion guard (AGENT tool) when the agent has an explicit tools list. Wildcard/inherit subagents still get the full EXCLUDED_TOOLS_FOR_SUBAGENTS filter, preventing task subagents from calling skill_manage. The dedicated skill review agent can now receive the skill_manage tool it requires. - Update manager.test.ts: replace merged_with_extract tests with concurrent-extract independent-scheduling tests. - Update skill-manage.test.ts: clarify test description to reflect wildcard-only exclusion semantics.
assertProjectSkillPath() uses path.resolve() which is purely lexical and does not dereference symlinks. If any path component inside .qwen/skills/ is a symlink pointing outside the project, fs.writeFile/readFile/rm would follow the link and mutate files outside the advertised write boundary. Add assertRealProjectSkillPath() (async) in skill-paths.ts that: - Resolves the real path of the skills root via fs.realpath() - Walks up from targetPath to find the nearest existing ancestor - Resolves that ancestor to its real filesystem path - Rejects if the real path falls outside the real skills root skill-manage.ts execute() now calls both the cheap lexical check (fast fail for obviously wrong paths) and the async real-path check before any fs.writeFile / fs.rm mutation. Add three symlink-specific tests in skill-paths.test.ts covering: - Legitimate path accepted - Symlinked directory pointing outside skills root rejected - Skills root itself being a symlink (safe target) accepted
…rite detection Address reviewer feedback: instead of keeping skill_manage as the sole write gate (which still had symlink bypass risk via generic tools), remove the dedicated tool entirely and replace with a two-layer protection: 1. skillsModifiedInSession (client.ts): detects writes to .qwen/skills/ by inspecting the file_path arg of every completed tool call, replacing the fragile historyCallsSkillManage() history scan. 2. hasAutoSkillSource + evaluateScopedDecision (skillReviewAgentPlanner.ts): the review agent's permission sandbox now verifies BOTH that the target path is inside the skills directory AND that the existing file already contains 'source: auto-skill' in its frontmatter before allowing edits, preventing the agent from overwriting user-managed skills. Changes: - Delete skill-manage.ts and skill-manage.test.ts - Remove SKILL_MANAGE from ToolNames, ToolDisplayNames, config registerLazy, agent-core EXCLUDED_TOOLS comment, and agent.ts comment - Replace historyCallsSkillManage() with skillsModified: boolean param in scheduleSkillReview; skip reason renamed skills_modified_in_session - recordCompletedToolCall(name, filePath?) detects .qwen/skills/ writes; CLI layers pass file_path arg from tool call request - Fix buildTaskPrompt frontmatter template to use top-level source: auto-skill - Update skill-paths.ts error messages to remove skill_manage references - Update all unit/integration tests accordingly
…Root
scheduleSkillReview() was launching a new background task every time the
threshold was reached for the same project, with no guard against multiple
in-flight reviews running concurrently.
Fix: add skillReviewInFlightByProject Map that tracks the taskId of any
running review per projectRoot. A second call while one is in-flight returns
{ status: 'skipped', skippedReason: 'already_running', taskId: <existing> }.
The map entry is cleared in a finally block inside runSkillReview() so the
next session can schedule a fresh review after the current one completes.
Also extend SkillReviewScheduleResult.skippedReason union to include
'already_running', and add a unit test covering the full lifecycle:
first call schedules, second call is skipped with existing taskId, and a
third call after completion schedules a new task.
wenshao
left a comment
There was a problem hiding this comment.
Note: CI is currently failing on all 9 Test jobs (ubuntu/windows/macos × node 20/22/24). One of the integration tests in this PR is itself the cause — see the inline comment on skillReviewNudge.integration.test.ts:279. — claude-opus-4-7 via Qwen Code /review
| } | ||
| case ToolNames.EDIT: | ||
| case ToolNames.WRITE_FILE: { | ||
| if (!ctx.filePath || !isProjectSkillPath(ctx.filePath, projectRoot)) { |
There was a problem hiding this comment.
[Critical] This is the same lexical-only check the previous reviewer flagged — assertRealProjectSkillPath (the symlink-aware async guard added in 60fbe2c) is never called from production code. grep -rn assertRealProjectSkillPath packages/core/src --include='*.ts' | grep -v '\.test\.' returns only the definition site in skill-paths.ts:48. Both client.ts:612 and the gates here at :100 and :260 use lexical isProjectSkillPath, which path.resolves but does not dereference symlinks. FileSystemService.writeTextFile calls plain fs.writeFile — symlinks are followed.
Concrete attack flow (the new-file edge case):
- Project ships
.qwen/skills/foo.mdas a dangling symlink to~/.zshrc(target doesn't exist yet). .qwen/settings.jsonships with{"memory":{"enableAutoSkill": true}}— once the user trusts the folder, this is honored.- After a single 20+ tool-call turn, the auto-skill agent picks
foo.mdto write. hasAutoSkillSource(filePath)follows the symlink → ENOENT → returnsnull→ permission'allow'("file doesn't exist, OK to create").fs.writeFilefollows the symlink and creates~/.zshrcwith the model's skill-body content. The user sees a friendly "Skill review updated 1 file(s)" toast.
Also missing: assertRealProjectSkillPath itself doesn't verify the resolved skills root sits inside the project — .qwen/skills symlinked to / would pass the containment check (every absolute path starts with /).
Fix:
- Replace
isProjectSkillPathhere (and at:260and inclient.ts:612) withawait assertRealProjectSkillPath(ctx.filePath, projectRoot)wrapped in try/catch (catch → return 'deny'). - Add an early check in
assertRealProjectSkillPaththatrealpath(skillsRoot)is insiderealpath(projectRoot). - To close the inevitable TOCTOU between check and write, open the file with
fs.constants.O_NOFOLLOW(or write to a temp file in the same dir thenrename(2)).
— claude-opus-4-7 via Qwen Code /review
| ctx: PermissionCheckContext, | ||
| projectRoot: string, | ||
| ): Promise<PermissionDecision> { | ||
| switch (ctx.toolName) { |
There was a problem hiding this comment.
[Critical] Two scoping holes in evaluateScopedDecision:
1. READ_FILE / LS are in the agent's allowlist (:242-248) but evaluateScopedDecision only handles SHELL/EDIT/WRITE_FILE — read tools fall to 'default', which the base PermissionManager allows for project paths and (under YOLO mode forced by runForkedAgent) auto-allows for outside-workspace paths too. The auto-skill agent inherits the parent session's full conversation history. A hostile user prompt or malicious tool-result text in that history can instruct the agent to read_file ~/.aws/credentials and embed the contents inside the SKILL.md it writes to .qwen/skills/. That file is in the user's project — which is often committed to git. Net: read-and-exfiltrate via auto-generated skill markdown.
2. SHELL is gated only by isShellCommandReadOnlyAST at :89-97. AST-based read-only classification is fragile against process substitution (<(cmd)), command substitution ($(rm x)), eval, here-docs writing to fd, etc. For an unsupervised background agent that doesn't actually need shell, the conservative posture is to deny SHELL entirely — read_file / ls cover the system-prompt's stated use case ("inspect existing skills before writing").
Fix:
- Add
READ_FILE/LScases toevaluateScopedDecision: only'allow'whenctx.filePathresolves insideprojectRoot(real-path);'deny'otherwise. - Drop
SHELLfrom the tool list at:242-248. If a specific shell command is needed later (ls -la), expose it through an explicit allowlist not the AST.
— claude-opus-4-7 via Qwen Code /review
| * `source: auto-skill` in its YAML frontmatter block. | ||
| * Returns false if the file does not exist (caller may allow creation). | ||
| */ | ||
| async function hasAutoSkillSource(filePath: string): Promise<boolean | null> { |
There was a problem hiding this comment.
[Critical] hasAutoSkillSource has three issues that combine into a containment bypass for new files:
1. Error swallowing too broad. The catch returns null for any fs.readFile error (ENOENT / EISDIR / EACCES / EMFILE / EPERM). Only ENOENT means "file does not exist; safe to create." The others mean "I can't tell" — narrowing them to 'allow' is fail-open.
2. New-file path allows any content. When null is returned, evaluateScopedDecision emits 'allow' regardless of what the agent is about to write. A model (steered by prompt injection in the parent session) can create new files in .qwen/skills/ without including source: auto-skill, permanently labeling the file as user-authored and immune to future auto-skill updates. The pre-write gate validates the on-disk content, not the proposed content; the protection relies entirely on the system prompt convincing the model to add the marker.
3. Regex /m flag matches frontmatter mid-file. /^---\r?\n([\s\S]*?)\r?\n---/m — with m, ^ matches the start of any line. A user-written skill that contains a ---\nsource: auto-skill\n--- block anywhere (e.g., inside a markdown example demonstrating the feature) is classified auto-managed and editable.
Fix:
} catch (err) {
if ((err as NodeJS.ErrnoException).code === 'ENOENT') return null;
return false; // any other read error → deny
}Drop the m flag (or anchor the match to byte 0). For new files, also require ctx.toolParams?.content (plumbed through PermissionCheckContext) to contain the marker — or post-write hook to roll back skill files lacking the marker.
— claude-opus-4-7 via Qwen Code /review
| // Both should be processed independently | ||
| expect(result1.taskId).toBeDefined(); | ||
| expect(result2.taskId).toBeDefined(); | ||
| expect(result1.taskId).not.toBe(result2.taskId); |
There was a problem hiding this comment.
[Critical] This test is currently failing on the PR branch:
× Test 5 > should handle multiple skill reviews for same project
→ expected 'cf1566...' not to be 'cf1566...' // Object.is equality
The in-flight dedupe added by f7601a0 makes the second scheduleSkillReview for the same project return the same taskId (with skippedReason: 'already_running'), but this test asserts expect(result1.taskId).not.toBe(result2.taskId) — the inverse. The test was written before the dedupe fix and never updated. As a result CI is red, and any future regression in dedupe would be masked because this test currently fails for an unrelated reason.
Fix: Update to assert result2.taskId === result1.taskId and result2.skippedReason === 'already_running'. Or delete this test — manager.test.ts:278 already covers the behavior correctly.
— claude-opus-4-7 via Qwen Code /review
| return promises; | ||
| } | ||
|
|
||
| recordCompletedToolCall(toolName: string, filePath?: string): void { |
There was a problem hiding this comment.
[Critical] recordCompletedToolCall recognizes only 'write_file'/'edit' and only inspects args.file_path. The sibling partWritesToMemory in manager.ts:195-200,237-248 (used to gate the extract task) recognizes 4 tool names (write_file, edit, replace, create_file) and 3 arg keys (file_path, path, target_file). The two sit ~370 lines apart in the same conceptual subsystem.
Concrete failure: A replace tool call (legacy alias for edit per tool-names.ts) that writes to .qwen/skills/foo/SKILL.md will not flip skillsModifiedInSession. The session's auto-skill agent then runs over a session that just wrote skills — exactly the scenario the flag was designed to suppress. Same for any future tool that uses path or target_file (e.g., MCP file-write servers).
This is also a test-coverage gap: no client.test.ts case asserts that recordCompletedToolCall('write_file', '<projectRoot>/.qwen/skills/foo.md') flips the flag.
Fix: Extract a shared extractWrittenFilePath(toolName, args): string | undefined helper used by both call sites. Use the ToolNames.WRITE_FILE/ToolNames.EDIT constants instead of raw strings (the file already imports ToolNames).
— claude-opus-4-7 via Qwen Code /review
| toolCallCount: this.toolCallCount, | ||
| skillsModified: this.skillsModifiedInSession, | ||
| enabled: autoSkillEnabled, | ||
| threshold: 20, |
There was a problem hiding this comment.
[Suggestion] threshold: 20, maxTurns: 8, timeoutMs: 120_000 are hardcoded literals at this call site, shadowing the named constants that are exported from manager.ts:187 (AUTO_SKILL_THRESHOLD) and skillReviewAgentPlanner.ts:26-27 (DEFAULT_AUTO_SKILL_MAX_TURNS, DEFAULT_AUTO_SKILL_TIMEOUT_MS). A maintainer who tunes AUTO_SKILL_THRESHOLD and runs the test suite (which uses the constant directly) sees green tests but ships a behavior bug because production reads the literal here.
Also enabled: autoSkillEnabled is dead — the call site is already inside if (autoSkillEnabled) { ... } two lines above, so this field is always true at this point.
Fix: Drop the four explicit fields and let scheduleSkillReview use its built-in defaults; or import the constants and pass them by reference. Either way, do not mix.
— claude-opus-4-7 via Qwen Code /review
| ); | ||
| } | ||
| } | ||
| this.toolCallCount = 0; |
There was a problem hiding this comment.
[Suggestion] toolCallCount = 0 and skillsModifiedInSession = false are reset every turn (at the end of runManagedAutoMemoryBackgroundTasks, which fires per UserQuery/ToolResult). The PR description and design doc (docs/design/skill-nudge/skill-nudge.md) explicitly say the counter is session-scoped and only resets at session start. The implementation makes it turn-scoped.
Two observable consequences:
- A single 20-tool-call turn fires immediately (matches the threshold, but feels surprising while the user is still in the session).
- A multi-turn session totaling 200 tool calls (none ≥ 20 in a single turn) never fires — contrary to the design intent.
Also, skillsModifiedInSession reset semantics: if the user edits a skill file in turn N (the gate at runManagedAutoMemoryBackgroundTasks skips that turn but then resets the flag), then turn N+1 with 20 unrelated tool calls will fire a review even though the user just hand-modified a skill.
Fix: Either (a) move the reset to a real session-end / /clear hook to match the design, or (b) update the design doc to per-turn semantics and rename the field to skillsModifiedThisTurn. Whichever the team picks, pin it in a test.
— claude-opus-4-7 via Qwen Code /review
| 'Managed skill review completed without durable changes.', | ||
| metadata: { touchedSkillFiles: result.touchedSkillFiles }, | ||
| }); | ||
| } catch (error) { |
There was a problem hiding this comment.
[Suggestion] Two divergences from the sibling runExtract path (and a debuggability gap):
1. Silent failure. runExtract (lines 600-667) rethrows after marking the record failed and emits a MemoryExtractEvent telemetry event for both completed and failed cases. runSkillReview swallows the error inside its catch (records record.error, never rethrows, never emits telemetry). Consumers awaiting skillReviewResult.promise can only learn about a failure by inspecting record.status === 'failed' — the promise never rejects.
2. No telemetry. There is no SkillReviewEvent analogous to MemoryExtractEvent. Combined with: no debugLogger calls in skillReviewAgentPlanner.ts (compare with how Step 3's deterministic tools log via debugLogger), no Footer.tsx indicator (only dream is shown there), no "Generated by Qwen Code skill review" marker in the SKILL.md body itself — when an autoSkill bug fires in production, the operator has no forensic trail: no session ID, no transcript, no agent prompt, nothing identifying the file as auto-generated except a regex-matched frontmatter field.
Fix:
- Pick one error-handling style. If "swallow" is intentional, add a comment here pointing readers to inspect
record.status. - Add a minimal
MemorySkillReviewEvent(or reuselogMemoryExtractwith ataskType: 'skill-review'discriminator) so failures are observable. - Embed a
<!-- Generated by Qwen Code skill review at <sessionId>/<timestamp> -->HTML comment at the top of every auto-generated SKILL.md so files are identifiable even if frontmatter is hand-edited away. - Surface
skill-reviewstatus inFooter.tsxnext todream.
— claude-opus-4-7 via Qwen Code /review
1. hasAutoSkillSource: narrow catch to ENOENT only (EISDIR/EACCES etc. return false to deny); tighten frontmatter regex to match opening block only. 2. evaluateScopedDecision: add explicit allow for READ_FILE and LS so they don't fall to 'default' which the base PermissionManager might widen; EDIT/WRITE_FILE now call assertRealProjectSkillPath() (async realpath guard) in addition to the lexical check, closing the symlink traversal hole. 3. isScopedTool / getScopedDenyRule: cover READ_FILE and LS so hasRelevantRules returns true and findMatchingDenyRule is correctly consulted for them. 4. recordCompletedToolCall (client.ts): broaden tool name set to match WRITE_TOOL_NAMES in manager.ts (write_file, edit, replace, create_file) and inspect all three arg keys (file_path, path, target_file). Signature changed from (name, filePath?) to (name, args?) to carry all args through. 5. client.ts hardcoded literals: replace threshold/maxTurns/timeoutMs with the named constants AUTO_SKILL_THRESHOLD / DEFAULT_AUTO_SKILL_MAX_TURNS / DEFAULT_AUTO_SKILL_TIMEOUT_MS imported from manager.ts and skillReviewAgentPlanner.ts. 6. toolCallCount / skillsModifiedInSession reset: only reset when skill review is actually scheduled (status === 'scheduled'), not every turn, so the counter correctly accumulates across turns within a session as per design doc. 7. runSkillReview (manager.ts): rethrow after marking record failed, consistent with runExtract behavior. 8. skillReviewNudge.integration.test.ts test 5: rewrite to reflect the in-flight dedup contract (second same-project call returns already_running with existing taskId; third call after completion gets a new task). Add vi.mock for runSkillReviewByAgent so the test does not need a full Config.
| this.toolCallCount = 0; | ||
| this.skillsModifiedInSession = false; | ||
| if (skillReviewResult.promise) { | ||
| this.pendingMemoryTaskPromises.push( |
There was a problem hiding this comment.
[Critical] runSkillReview() now rethrows background-agent failures, but this promise is pushed with only .then(...). In the interactive path, useGeminiStream also consumes pending memory-task promises with only void p.then(...), so an autoSkill timeout/API/permission failure can surface as an unhandled rejection instead of being treated as best-effort background work like extract/dream tasks.
| this.pendingMemoryTaskPromises.push( | |
| this.pendingMemoryTaskPromises.push( | |
| skillReviewResult.promise | |
| .then((record) => { | |
| const touched = record.metadata?.['touchedSkillFiles']; | |
| return Array.isArray(touched) ? touched.length : 0; | |
| }) | |
| .catch((error: unknown) => { | |
| debugLogger.warn( | |
| 'Failed to run managed skill review.', | |
| error, | |
| ); | |
| return 0; | |
| }), | |
| ); |
— gpt-5.5 via Qwen Code /review
| case ToolNames.LS: | ||
| // Read tools are always allowed — the agent needs to inspect skills. | ||
| return 'allow'; | ||
| case ToolNames.SHELL: { |
There was a problem hiding this comment.
[Critical] This allows any shell command classified as read-only without constraining the command's cwd or referenced paths to .qwen/skills. Even if read_file / list_directory are scoped separately, the background skill-review agent can still read arbitrary local files through commands like cat, grep, find, or ls, which bypasses the intended project-skills-only inspection model.
Either remove ToolNames.SHELL from the skill-review agent tools, or make this approval path-aware and realpath-validated under getProjectSkillsRoot(projectRoot) before returning allow.
— gpt-5.5 via Qwen Code /review
| // Resolve the real path of the skills root (it may itself be a symlink). | ||
| let realSkillsRoot: string; | ||
| try { | ||
| realSkillsRoot = await fs.realpath(skillsRoot); |
There was a problem hiding this comment.
[Critical] A dangling symlink at the final target path is still treated as a safe missing file. For example, if .qwen/skills/foo/SKILL.md is an existing symlink to /tmp/pwned-skill.md whose target does not exist, fs.realpath(Skill.md) throws ENOENT, this helper walks to the parent and returns success, and hasAutoSkillSource() also treats readFile ENOENT as creation. write_file then follows the symlink and writes outside the skills root.
Before treating ENOENT as safe, lstat() the current path; if the path itself exists and is a symlink, deny it or resolve readlink() and verify the resolved target stays under the real skills root. Also only let hasAutoSkillSource() ENOENT mean creation after the symlink-aware check proves the final path is not an existing symlink.
— gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No-unit-test gaps (can't map to diff lines):
packages/core/src/memory/skillReviewAgentPlanner.ts(294 lines): entire file has no dedicated unit tests. Permission sandbox logic (evaluateScopedDecision,hasAutoSkillSource,mergePermissionDecision) is untested.packages/core/src/core/client.ts:617-641(recordCompletedToolCall): tool counting andskillsModifiedInSessiondetection logic has no direct unit tests.
— deepseek-v4-pro via Qwen Code /review
|
|
||
| if (!this.config.getManagedAutoMemoryEnabled()) { | ||
| if ( | ||
| messageType !== SendMessageType.UserQuery && |
There was a problem hiding this comment.
[Critical] runManagedAutoMemoryBackgroundTasks gate expanded from UserQuery-only to UserQuery || ToolResult, but the gate change applies to the entire method — not just the autoSkill block. This causes scheduleExtract() and scheduleDream() to fire on every tool result submission, not just on user-initiated turns.
Each call triggers getHistory() (deep clone of entire conversation) and file I/O checks. A session with 20 tool calls now triggers 20 extract+dream attempts instead of 1.
| messageType !== SendMessageType.UserQuery && | |
| private runManagedAutoMemoryBackgroundTasks( | |
| messageType: SendMessageType, | |
| ): void { | |
| // autoSkill: can trigger on UserQuery or ToolResult | |
| if ( | |
| messageType === SendMessageType.UserQuery || | |
| messageType === SendMessageType.ToolResult | |
| ) { | |
| if (this.config.getAutoSkillEnabled()) { | |
| // ... scheduleSkillReview logic (existing) | |
| } | |
| } | |
| // extract/dream: keep UserQuery-only to preserve existing behavior | |
| if (messageType !== SendMessageType.UserQuery) return; | |
| if (!this.config.getManagedAutoMemoryEnabled()) return; | |
| // ... scheduleExtract, scheduleDream (existing) | |
| } |
— deepseek-v4-pro via Qwen Code /review
| timeoutMs: DEFAULT_AUTO_SKILL_TIMEOUT_MS, | ||
| }); | ||
| if (skillReviewResult.status === 'scheduled') { | ||
| // Reset per-session counters only when review is actually dispatched, |
There was a problem hiding this comment.
[Critical] skillsModifiedInSession is an irreversible deadlock: once set to true by any write to .qwen/skills/, it can never be reset. The reset (this.skillsModifiedInSession = false at line 546) is inside if (skillReviewResult.status === 'scheduled'), but scheduleSkillReview() checks skillsModified before reaching the scheduling path and returns 'skipped' immediately. One skill write permanently disables autoSkill for the entire session.
| // Reset per-session counters only when review is actually dispatched, | |
| if (skillReviewResult.status === 'scheduled') { | |
| this.toolCallCount = 0; | |
| if (skillReviewResult.promise) { | |
| this.pendingMemoryTaskPromises.push( | |
| skillReviewResult.promise.then((record) => { | |
| const touched = record.metadata?.['touchedSkillFiles']; | |
| return Array.isArray(touched) ? touched.length : 0; | |
| }).catch((error: unknown) => { | |
| debugLogger.warn('Failed to run managed skill review.', error); | |
| return 0; | |
| }), | |
| ); | |
| } | |
| } | |
| // Always reset to avoid deadlock — any write to skills dir in this | |
| // turn was already captured by scheduleSkillReview's check above. | |
| this.skillsModifiedInSession = false; |
— deepseek-v4-pro via Qwen Code /review
| enableManagedAutoMemory?: boolean; | ||
| /** Enable managed auto-dream consolidation separately from extraction. Defaults to true. */ | ||
| enableManagedAutoDream?: boolean; | ||
| /** Enable automatic project skill review after tool-heavy sessions. Defaults to true. */ |
There was a problem hiding this comment.
[Critical] JSDoc says Defaults to true. but the Config constructor (this.enableAutoSkill = params.enableAutoSkill ?? false), settingsSchema.ts (default: false), and loadCliConfig all default to false. The PR description also says「默认关闭」. The JSDoc is misleading.
| /** Enable automatic project skill review after tool-heavy sessions. Defaults to true. */ | |
| /** Enable automatic project skill review after tool-heavy sessions. Defaults to false. */ | |
| enableAutoSkill?: boolean; |
— deepseek-v4-pro via Qwen Code /review
| toolName: string, | ||
| args?: Record<string, unknown>, | ||
| ): void { | ||
| const SKILL_WRITE_TOOL_NAMES = new Set([ |
There was a problem hiding this comment.
[Suggestion] Two issues with the write-tool detection: (1) SKILL_WRITE_TOOL_NAMES is a new Set() allocated on every recordCompletedToolCall invocation. (2) It contains 'replace' and 'create_file' which don't exist in QwenCode's tool registry — likely copy-paste residue. manager.ts:200 already has a module-level WRITE_TOOL_NAMES constant with correct values. Also, run_shell_command with redirects (e.g. echo ... > .qwen/skills/evil/SKILL.md) bypasses this detection entirely.
| const SKILL_WRITE_TOOL_NAMES = new Set([ | |
| // Module-level (outside method): | |
| const SKILL_WRITE_TOOL_NAMES: ReadonlySet<string> = new Set([ | |
| 'write_file', | |
| 'edit', | |
| ]); |
— deepseek-v4-pro via Qwen Code /review
| }, | ||
| }); | ||
|
|
||
| const promise = this.track(record.id, this.runSkillReview(record, params)); |
There was a problem hiding this comment.
[Suggestion] runSkillReview() has no telemetry at all. Contrast with runExtract() which emits MemoryExtractEvent on both success and failure paths. Without telemetry, there's no way to monitor skill review reliability or effectiveness in production.
Consider adding a telemetry event recording status, touchedSkillFiles, and duration_ms.
— deepseek-v4-pro via Qwen Code /review
| const skillsRoot = path.resolve(getProjectSkillsRoot(projectRoot)); | ||
| const resolved = path.resolve(filePath); | ||
| return resolved === skillsRoot || resolved.startsWith(skillsRoot + path.sep); | ||
| } |
There was a problem hiding this comment.
[Suggestion] isProjectSkillPath calls path.resolve(filePath) which resolves relative paths against process.cwd(), not projectRoot. In multi-root workspaces or when cwd differs from project root, a relative filePath could resolve outside the skills containment check.
| } | |
| const resolved = path.resolve(projectRoot, filePath); |
— deepseek-v4-pro via Qwen Code /review
| ); | ||
| } else { | ||
| // Explicit tool list: only prevent recursive agent spawning. | ||
| toolsList.push( |
There was a problem hiding this comment.
[Suggestion] For subagents with explicit tool lists, the exclusion set was narrowed from EXCLUDED_TOOLS_FOR_SUBAGENTS (blocks AGENT, CRON_CREATE, CRON_LIST, CRON_DELETE, TASK_STOP, SEND_MESSAGE) to just recursionGuardOnly (only AGENT). This means control-plane tools like TASK_STOP and CRON_CREATE could leak to explicitly-configured subagents. While the skill review agent doesn't declare these, this weakens the safety net for future subagent configs.
Consider restoring the full exclusion or documenting the rationale.
— deepseek-v4-pro via Qwen Code /review
| } | ||
| this.toolCallCount += 1; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] toolCallCount increments for all completed tool calls, including read-only operations (list_directory, read_file, grep_search). A simple "read one file and edit one line" task can produce 5-10 tool calls. In practice, the threshold of 20 is reached much sooner than the design doc's intended "tool-heavy session" semantics suggest.
Consider weighting write operations higher, or counting only non-read-only tools.
— deepseek-v4-pro via Qwen Code /review
Code ReviewOverviewAdds a background "skill review" agent that runs after tool-heavy sessions (≥ 20 tool calls) to extract reusable procedural skills into High-priority issues1. Silent behavior change to existing
|
- skill-paths: detect dangling symlinks with lstat before treating ENOENT as safe - skill-paths: fix isProjectSkillPath relative path resolution to use projectRoot - skillReviewAgentPlanner: restrict READ_FILE/LS to project root only - skillReviewAgentPlanner: remove SHELL tool from review agent tool list - skillReviewAgentPlanner: add path import; remove unused shell imports - skillReviewAgentPlanner: add comment for buildAgentHistory trailing user message - client: fix runManagedAutoMemoryBackgroundTasks gate widening - client: fix skillsModifiedInSession deadlock - client: add .catch() to skill review promise - client: hoist SKILL_WRITE_TOOL_NAMES to module-level ReadonlySet - agent-core: use full EXCLUDED_TOOLS_FOR_SUBAGENTS for explicit tool list subagents - manager: extend notify() signature to accept 'skill-review' taskType - config: fix JSDoc default value comment (false, not true)
wenshao
left a comment
There was a problem hiding this comment.
- autoSkill 分支完全未经测试 —
client.test.ts中getAutoSkillEnabled始终 mock 为false,runManagedAutoMemoryBackgroundTasks的整个 autoSkill 路径(调度、threshold 检查、promise 处理、计数器重置)从未被执行。 recordCompletedToolCall完全未经测试 — 方法有零测试覆盖率。skillsModifiedInSession的唯一设置机制未经验证。nonInteractiveCli.ts新增调用无断言 —recordCompletedToolCall和consumePendingMemoryTaskPromises有 mock 定义但无断言验证调用。useGeminiStream.ts新增循环无断言 —recordCompletedToolCall的 for 循环有 mock 无断言,TUI 模式下 tool 计数无效。scheduleSkillReview去重粒度与scheduleExtract不一致 — extract 排队尾随请求,skill-review 直接丢弃。若审查飞行期间再次达到阈值,后续调用被静默跳过且toolCallCount不重置。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| @@ -148,6 +148,8 @@ describe('runNonInteractive', () => { | |||
|
|
|||
| mockGeminiClient = { | |||
There was a problem hiding this comment.
[Critical] TS2353: consumePendingMemoryTaskPromises 在 mock 对象类型中不存在。GeminiClient 新增了此方法(第 762 行),但 mock 类型未包含它,导致 tsc --noEmit 编译失败。
| mockGeminiClient = { | |
| consumePendingMemoryTaskPromises: vi.fn().mockReturnValue([]), | |
| recordCompletedToolCall: vi.fn(), |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| @@ -166,6 +192,8 @@ export interface DrainOptions { | |||
|
|
|||
| export const EXTRACT_TASK_TYPE = 'managed-auto-memory-extraction' as const; | |||
There was a problem hiding this comment.
[Critical] SKILL_REVIEW_TASK_TYPE = 'managed-skill-extractor' 被导出但从未使用。实际的 makeTaskRecord 调用使用字面量 'skill-review'。两者值不同——要么常量的值错误,要么 record 创建错误地绕过了常量。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| messageType === SendMessageType.UserQuery || | ||
| messageType === SendMessageType.ToolResult | ||
| ) { | ||
| const projectRoot = this.config.getProjectRoot(); |
There was a problem hiding this comment.
[Suggestion] this.getHistory()(含 structuredClone 深拷贝)在每个 ToolResult 轮次中被无条件调用,即使 autoSkillEnabled 为 false(默认值)。将 getHistory() 移入 autoSkillEnabled 守卫内部,避免对使用默认配置的用户造成不必要的 O(n) 性能开销。
| const projectRoot = this.config.getProjectRoot(); | |
| if (autoSkillEnabled && this.toolCallCount >= AUTO_SKILL_THRESHOLD) { | |
| const history = this.getHistory(); | |
| const skillReviewResult = mgr.scheduleSkillReview({ | |
| ... | |
| }); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| @@ -444,9 +475,9 @@ export class MemoryManager { | |||
| * subscribers can be reached too; the unfiltered subscriber set | |||
| * always receives the wakeup either way. | |||
There was a problem hiding this comment.
[Suggestion] notify() 的 taskType 参数包含 'skill-review',并通过 taskType !== 'skill-review' 显式跳过 typed subscribers。但 subscribe() 的 taskType 只接受 'extract' | 'dream',不包含 'skill-review'。这是一个无法通过公共 API 到达的代码路径——要么移除跳过守卫,要么扩展 subscribe 以接受 'skill-review'。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| if ( | ||
| resolvedRead === resolvedRoot || | ||
| resolvedRead.startsWith(resolvedRoot + path.sep) | ||
| ) { |
There was a problem hiding this comment.
[Suggestion] evaluateScopedDecision 中 READ_FILE 和 LS 的范围是整个项目根目录,而非仅 .qwen/skills/。Skill review agent 可以读取项目中的任意文件(如 .env、配置文件),并将内容嵌入生成的 skill 文件,可能造成密钥泄露。
建议将 READ_FILE 限制为 .qwen/skills/,或至少添加 deny 规则阻止常见密钥文件名。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| const toolsList: FunctionDeclaration[] = []; | ||
|
|
||
| const excludedFromSubagents = EXCLUDED_TOOLS_FOR_SUBAGENTS; | ||
| // When a subagent has an explicit tools list (not wildcard), only the |
There was a problem hiding this comment.
[Suggestion] recursionGuardOnly 的注释与实际逻辑矛盾。顶部注释说「When a subagent has an explicit tools list...only the recursive-spawn guard is enforced」,但 else 分支内部却写着「Wildcard tool list...only block recursive spawning」——而这段代码实际执行的是显式工具列表路径。两个注释互相矛盾,会误导未来的维护者。
| // When a subagent has an explicit tools list (not wildcard), only the | |
| // Explicit tool list: only block recursive agent spawning. | |
| // Other control-plane tools (cron, task, send_message) are | |
| // allowed since the explicit list already bounds the agent. | |
| const recursionGuardOnly = new Set<string>([ToolNames.AGENT]); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
- client: reset toolCallCount when scheduleSkillReview returns already_running and count >= threshold, preventing immediate cascade after in-flight review - client.test: add autoSkill branch tests (scheduled/already_running/skills_modified) - client.test: add full recordCompletedToolCall unit tests (skillsModifiedInSession, toolCallCount increment, skill path detection for write_file/edit/read_file) - client.test: add scheduleSkillReview mock to mockMemoryManager - nonInteractiveCli.test: add assertions for recordCompletedToolCall and consumePendingMemoryTaskPromises in tool-call integration test
|
感谢所有 review 意见!已在最新两个提交( Round 1 修复(
|
| 问题 | 处理方式 |
|---|---|
C1 悬空符号链接绕过 assertRealProjectSkillPath |
lstat 检测 symlink,在 ENOENT 路径前先拦截 |
C2 autoSkill 门控宽化(ToolResult 也触发 extract/dream) |
拆分为两个独立 if 块,extract/dream 保持 UserQuery only |
C3 skillsModifiedInSession 死锁 |
将 = false 移到 scheduled 分支外,始终重置 |
| C4 Promise 未捕获异常 | 添加 .catch() 并记录 debugLogger.warn |
| C5 subagent 工具列表泄露受限工具 | 使用完整 EXCLUDED_TOOLS_FOR_SUBAGENTS,不再只过滤 AGENT |
| S1 READ_FILE/LS 无路径限制 | evaluateScopedDecision 中添加 projectRoot containment 检查 |
S5 isProjectSkillPath 相对路径基准 |
path.resolve(filePath) → path.resolve(projectRoot, filePath) |
| S7 SHELL 工具 | 从 autoSkill review agent 工具列表中完全移除 |
S8 buildAgentHistory 注释缺失 |
添加说明为何 user role 末尾也需截断 |
| S10 JSDoc 默认值 | 修正 enableAutoSkill 注释:Defaults to true → false |
SKILL_WRITE_TOOL_NAMES |
提升为 module-level ReadonlySet,使用 ToolNames 枚举 |
manager.ts notify() 类型 |
扩展签名接受 'skill-review' taskType |
Round 2 修复(37d83a21d)
| 问题 | 处理方式 |
|---|---|
去重粒度 bug — already_running 时 toolCallCount 不重置 |
当 skippedReason === 'already_running' && count >= threshold 时也重置计数器,防止 in-flight 完成后立即级联触发新 review |
| autoSkill 分支未测试 | 新增 4 个测试用例覆盖 scheduled/skipped/already_running/skillsModified 路径 |
recordCompletedToolCall 未测试 |
新增 5 个单元测试:计数递增、技能路径检测(write_file/edit)、非写入工具不触发 |
mockMemoryManager 缺少 scheduleSkillReview |
添加默认 mock,所有现有测试正常兼容 |
nonInteractiveCli.test.ts 无断言 |
在 tool-call 集成测试中添加 recordCompletedToolCall 和 consumePendingMemoryTaskPromises 断言 |
关于 useGeminiStream.ts 覆盖缺口
useGeminiStream 是一个约 2340 行的复杂 Ink React Hook,目前整个文件没有单元测试文件。为该 hook 添加测试需要完整的 Ink/React 渲染环境,超出本次 PR 的范围。已知 issue,后续可独立创建测试。
测试状态
- ✅ 单元测试:145 个测试通过(
skill-paths、manager、skillReviewNudge.integration、client) - ✅ CLI e2e 非 API 测试:18 个通过(
settings-migration+json-output) ⚠️ 其余 e2e 失败:均为本地无GEMINI_API_KEY环境导致的超时,与本 PR 改动无关(所有失败文件均未被本 PR 修改)⚠️ vscode-ide-companion测试失败:预先存在于main分支,与本 PR 无关(toolcalls/index.test.tsx最后由2fea1d3aa修改,该提交已在 main 上)
1 similar comment
|
感谢所有 review 意见!已在最新两个提交( Round 1 修复(
|
| 问题 | 处理方式 |
|---|---|
C1 悬空符号链接绕过 assertRealProjectSkillPath |
lstat 检测 symlink,在 ENOENT 路径前先拦截 |
C2 autoSkill 门控宽化(ToolResult 也触发 extract/dream) |
拆分为两个独立 if 块,extract/dream 保持 UserQuery only |
C3 skillsModifiedInSession 死锁 |
将 = false 移到 scheduled 分支外,始终重置 |
| C4 Promise 未捕获异常 | 添加 .catch() 并记录 debugLogger.warn |
| C5 subagent 工具列表泄露受限工具 | 使用完整 EXCLUDED_TOOLS_FOR_SUBAGENTS,不再只过滤 AGENT |
| S1 READ_FILE/LS 无路径限制 | evaluateScopedDecision 中添加 projectRoot containment 检查 |
S5 isProjectSkillPath 相对路径基准 |
path.resolve(filePath) → path.resolve(projectRoot, filePath) |
| S7 SHELL 工具 | 从 autoSkill review agent 工具列表中完全移除 |
S8 buildAgentHistory 注释缺失 |
添加说明为何 user role 末尾也需截断 |
| S10 JSDoc 默认值 | 修正 enableAutoSkill 注释:Defaults to true → false |
SKILL_WRITE_TOOL_NAMES |
提升为 module-level ReadonlySet,使用 ToolNames 枚举 |
manager.ts notify() 类型 |
扩展签名接受 'skill-review' taskType |
Round 2 修复(37d83a21d)
| 问题 | 处理方式 |
|---|---|
去重粒度 bug — already_running 时 toolCallCount 不重置 |
当 skippedReason === 'already_running' && count >= threshold 时也重置计数器,防止 in-flight 完成后立即级联触发新 review |
| autoSkill 分支未测试 | 新增 4 个测试用例覆盖 scheduled/skipped/already_running/skillsModified 路径 |
recordCompletedToolCall 未测试 |
新增 5 个单元测试:计数递增、技能路径检测(write_file/edit)、非写入工具不触发 |
mockMemoryManager 缺少 scheduleSkillReview |
添加默认 mock,所有现有测试正常兼容 |
nonInteractiveCli.test.ts 无断言 |
在 tool-call 集成测试中添加 recordCompletedToolCall 和 consumePendingMemoryTaskPromises 断言 |
关于 useGeminiStream.ts 覆盖缺口
useGeminiStream 是一个约 2340 行的复杂 Ink React Hook,目前整个文件没有单元测试文件。为该 hook 添加测试需要完整的 Ink/React 渲染环境,超出本次 PR 的范围。已知 issue,后续可独立创建测试。
测试状态
- ✅ 单元测试:145 个测试通过(
skill-paths、manager、skillReviewNudge.integration、client) - ✅ CLI e2e 非 API 测试:18 个通过(
settings-migration+json-output) ⚠️ 其余 e2e 失败:均为本地无GEMINI_API_KEY环境导致的超时,与本 PR 改动无关(所有失败文件均未被本 PR 修改)⚠️ vscode-ide-companion测试失败:预先存在于main分支,与本 PR 无关(toolcalls/index.test.tsx最后由2fea1d3aa修改,该提交已在 main 上)
wenshao
left a comment
There was a problem hiding this comment.
Critical — test coverage gaps (无法映射到具体 diff 行)
C1. packages/core/src/core/client.test.ts — runManagedAutoMemoryBackgroundTasks 的 ToolResult 触发路径零测试覆盖。autoSkill 块的门控已扩展为 UserQuery || ToolResult(代码注释明确说明"so the threshold can fire mid-session"),但所有 autoSkill 测试仅使用 UserQuery 路径。阈值跨过的关键场景(工具执行中达到 20 次)依赖 ToolResult 路径触发,若有 bug 无法被发现。
C2. packages/core/src/memory/manager.test.ts — runSkillReview 的 catch/error 路径零测试覆盖。runSkillReviewByAgent 始终被 mock 为 resolved,没有任何测试验证 rejection 时 task record 状态正确过渡到 'failed',也没有验证 finally 块中 skillReviewInFlightByProject.delete() 清理是否执行。
S3. packages/cli/src/ui/hooks/useGeminiStream.test.tsx — recordCompletedToolCall 被 mock 为 vi.fn(),但整个测试文件中没有任何断言验证该 mock 被实际调用或传入了正确的工具名和参数。交互式路径中的 toolCallCount 递增和 skillsModifiedInSession 检测完全未经测试。
S4. agent-core.ts / agent.ts — 此 PR 修改了两处子 agent 工具过滤行为(内联 FunctionDeclaration 过滤 + parentToolDecls 过滤),但没有任何回归测试验证这些变更。
| realSkillsRoot = await fs.realpath(skillsRoot); | ||
| } catch { | ||
| // Skills root does not exist yet — nothing to traverse. | ||
| return; |
There was a problem hiding this comment.
[Critical] assertRealProjectSkillPath 在 .qwen/skills/ 目录不存在时提前返回(catch 后直接 return),完全跳过 symlink 遍历检测。如果攻击者将 .qwen/ 替换为指向项目外部的 symlink(例如 .qwen/ → /tmp/evil/),随后 review agent 首次写入 skill 文件时会跟随 symlink 写到项目外。symlink 防护恰好覆盖了最需要防护的场景(目录不存在时的首次写入),却在此路径直接放行。
| return; | |
| // 不应在 skills root 不存在时直接返回。应向上遍历找到最近的已存在祖先目录,解析其真实路径,验证是否在项目根目录内。 | |
| try { | |
| realSkillsRoot = await fs.realpath(skillsRoot); | |
| } catch (err) { | |
| if ((err as NodeJS.ErrnoException).code === 'ENOENT') { | |
| // Walk up to find nearest existing ancestor and verify it's within projectRoot | |
| let ancestor = path.dirname(skillsRoot); | |
| while (true) { | |
| try { | |
| const realAncestor = await fs.realpath(ancestor); | |
| const resolvedRoot = path.resolve(projectRoot); | |
| if (realAncestor !== resolvedRoot && !realAncestor.startsWith(resolvedRoot + path.sep)) { | |
| throw new Error(`Skills root ancestor ${ancestor} resolves outside project`); | |
| } | |
| break; | |
| } catch (innerErr) { | |
| if ((innerErr as NodeJS.ErrnoException).code === 'ENOENT') { | |
| const parent = path.dirname(ancestor); | |
| if (parent === ancestor) break; | |
| ancestor = parent; | |
| continue; | |
| } | |
| throw innerErr; | |
| } | |
| } | |
| return; | |
| } | |
| throw err; | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| }, | ||
| }; | ||
|
|
||
| const scopedConfig = Object.create(config) as Config; |
There was a problem hiding this comment.
[Critical] createSkillScopedAgentConfig 使用 Object.create(config) 创建 scoped config,仅覆盖 getPermissionManager,所有其他 Config 方法(getGeminiClient()、getModel()、getProjectRoot() 等)通过原型链完全继承原始 config,不做任何沙箱限制。如果 runForkedAgent 内部通过 config 访问除 permission manager 之外的任何能力,skill review agent 将获得完整权限。建议创建真正的受限 Config 包装器,显式覆盖所有可能被 fork agent 使用的方法。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| "If nothing is worth saving, just say 'Nothing to save.' and stop.", | ||
| ].join('\n'); | ||
|
|
||
| function buildAgentHistory(history: Content[]): Content[] { |
There was a problem hiding this comment.
[Critical] buildAgentHistory 在 last.role !== 'model' 时无条件 history.slice(0, -1)。如果历史仅包含一条用户消息(headless 单次查询),将产生空数组,且没有任何 warn 日志、遥测事件或跳过原因记录说明为何技能提取未执行。这是零信号的静默故障模式 — 在无头模式下发生崩溃或会话提前终止时,autoSkill 功能彻底失效却无可观测信号。
| function buildAgentHistory(history: Content[]): Content[] { | |
| if (last.role !== 'model') { | |
| debugLogger.warn('buildAgentHistory: dropping trailing user turn, review may have no context', { historyLength: history.length }); | |
| return history.slice(0, -1); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| ]; | ||
| } | ||
|
|
||
| function buildTaskPrompt(skillsRoot: string): string { |
There was a problem hiding this comment.
[Critical] buildTaskPrompt 生成的 SKILL.md frontmatter 模板只包含 name、description、source: auto-skill 和 extracted_at,没有 session_id 字段。sessionId 在整个调用链中可用(ScheduleSkillReviewParams → runSkillReview),但从未传递给 runSkillReviewByAgent 或 buildTaskPrompt。当生成的 SKILL.md 包含错误指令或幻觉步骤时,只能凭时间戳做弱匹配来追溯源对话,所有磁盘上的会话日志都是候选,无确定性关联。建议在 frontmatter 中增加 session_id,并考虑在 runSkillReview 完成时发出 SkillReviewEvent 遥测事件。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| vi.spyOn(client['config'], 'getProjectRoot').mockReturnValue( | ||
| '/project', | ||
| ); | ||
| client.recordCompletedToolCall('edit', { |
There was a problem hiding this comment.
[Suggestion] edit 工具测试使用 { path: '/project/.qwen/skills/my-skill.md' },但生产环境中 edit 工具使用 file_path 参数名(非 path)。此测试验证的是死回退分支 args['path'],而非真实的 file_path 路径。如果 args['file_path'] 处理逻辑有 bug,此测试无法发现。
| client.recordCompletedToolCall('edit', { | |
| client.recordCompletedToolCall('edit', { | |
| file_path: '/project/.qwen/skills/my-skill.md', | |
| }); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| } | ||
| toolsList.push(...onlyInlineDecls); | ||
| // Also filter inline FunctionDeclaration[] passed directly in toolConfig. | ||
| toolsList.push( |
There was a problem hiding this comment.
[Suggestion] 内联 FunctionDeclaration[](onlyInlineDecls)仅按 recursionGuardOnly(只含 AGENT)过滤,而显式字符串工具列表按完整的 excludedFromSubagents(含 CRON_CREATE、TASK_STOP、SEND_MESSAGE 等)过滤。如果通过内联方式传入控制面工具,将绕过子 agent 排除过滤器。建议对 onlyInlineDecls 使用相同的 excludedFromSubagents 集合,保持一致性。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| args?: Record<string, unknown>, | ||
| ): void { | ||
| if (args && SKILL_WRITE_TOOL_NAMES.has(toolName)) { | ||
| const filePath = args['file_path'] ?? args['path'] ?? args['target_file']; |
There was a problem hiding this comment.
[Suggestion] skillsModifiedInSession 检测仅检查 write_file/edit 工具的参数(args['file_path']),无法检测用户通过 shell 工具进行的写入(cp skill.md .qwen/skills/、echo > .qwen/skills/x/SKILL.md)。当用户通过 shell 向 skills 目录写入后继续使用,review agent 仍可能被调度,与设计意图("用户本轮已主动操作 skill,跳过自动 review")不一致。建议作为最小化方案,检测到任何 shell 工具调用时将 skillsModifiedInSession 置为 true。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| ); | ||
|
|
||
| for (const toolCall of geminiTools) { | ||
| geminiClient?.recordCompletedToolCall( |
There was a problem hiding this comment.
[Suggestion] geminiTools 遍历中调用 recordCompletedToolCall 时未过滤已取消(cancelled)的工具调用。已取消的工具从未执行任何实际工作,但会计入 toolCallCount,人工降低技能提炼阈值。且 cancelled 的 write_file/edit 也会设置 skillsModifiedInSession = true,即使用户从未实际提交写入。
| geminiClient?.recordCompletedToolCall( | |
| for (const toolCall of geminiTools.filter(tc => tc.status !== 'cancelled')) { | |
| geminiClient?.recordCompletedToolCall( | |
| toolCall.request.name, | |
| toolCall.request.args as Record<string, unknown>, | |
| ); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| } | ||
| } | ||
|
|
||
| function getScopedDenyRule( |
There was a problem hiding this comment.
[Suggestion] getScopedDenyRule 对 READ_FILE/LS 返回 undefined(无拒绝规则消息),与 EDIT/WRITE_FILE 的信息丰富的拒绝消息形成对比。当 READ_FILE/LS 因 scope 违规被拒绝时,agent 收到的是毫无信息量的通用 'deny',日志中也看不到拒绝原因,调试困难。建议添加类似 ManagedSkillReview(read_file: only within project root ${projectRoot}) 的拒绝规则。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
TLDR
新增 autoSkill 功能:当会话内工具调用次数达到阈值(默认 20 次)时,在会话结束后自动 fork 一个后台 review agent,将本次对话中发现的可复用操作流程提炼为 project-level skill,写入
${projectRoot}/.qwen/skills/。该功能默认关闭(
memory.enableAutoSkill: false),用户需在.qwen/settings.json中显式开启。核心变更:
skill_manage专用工具,review agent 改用通用write_file/edit,写保护改由双层机制实现(直接响应 reviewer 的安全反馈)skillsModifiedInSession路径检测:recordCompletedToolCall(name, filePath?)检测每次工具调用的file_path参数,凡写入.qwen/skills/的操作均将skillsModifiedInSession置为 true,会话结束时跳过 review 调度(防止 review agent 和用户在同一会话内双写冲突)source: auto-skillfrontmatter 双重保护:review agent 权限沙箱在允许 EDIT/WRITE_FILE 前,同时验证目标路径在 skills 目录内且文件已含source: auto-skillfrontmatter,防止 agent 覆盖用户手写技能skillReviewAgentPlanner.ts:构建 skill review agent 的 prompt 和权限沙箱skillNudge.*(4 个字段)简化为memory.enableAutoSkill(单一开关),阈值/轮次/超时全部硬编码Screenshots / Video Demo
N/A — CLI 后台任务,无直接 UI 变化。效果体现在会话结束后
.qwen/skills/目录内新增 SKILL.md 文件。Dive Deeper
设计文档见
docs/design/skill-nudge/skill-nudge.md架构决策:
client.ts中recordCompletedToolCall检测file_path是否指向.qwen/skills/,有写操作则置skillsModifiedInSession=true,会话结束不再触发 reviewskillReviewAgentPlanner.ts中evaluateScopedDecision对 EDIT/WRITE_FILE 同时验证路径 +hasAutoSkillSource()读取文件 frontmatter,两项均通过才 allowscheduleSkillReview检测是否有 pending/running extract 任务,如有则注入shouldReviewSkillsAlso: true,由同一个 forked agent 用 combined prompt 同时完成 memory 和 skill 提炼,避免两次 LLM 调用max_iterations=8、timeoutMs=120_000均硬编码,对外只保留memory.enableAutoSkill开关source: auto-skill,可通过edit追加更新而非全量覆盖Reviewer Test Plan
enableAutoSkill: false跑复杂任务.qwen/skills/无变化/tmp/source: auto-skill的技能Testing Matrix
Unit / Integration Test Results
手动测试结果(macOS)
.qwen/skills/无变化source: auto-skillLinked issues / bugs
N/A — 新功能