feat(prompt): enhance system prompts with global reasoning discipline and iterative planning#4436
Conversation
📋 Review SummaryThis PR enhances system prompts across 4 files to improve the model's global reasoning, iterative planning, agent delegation quality, and task management discipline. The changes are well-crafted, adding structured behavioral constraints without modifying core logic. All 163 affected unit tests pass, and TypeScript type checking succeeds. The prompt additions are thoughtful and address genuine gaps in the existing system. 🔍 General FeedbackPositive aspects:
Observations:
🎯 Specific Feedback🔵 LowFile:
File: File: File: ✅ Highlights
🔒 Security, Performance, Reliability Notes
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
No review findings (all issues low-confidence, terminal-only). Downgraded from Approve to Comment: CI failing: Test (windows-latest, Node 22.x). — qwen3.7-max via Qwen Code /review
PR 4436 本地 tmux 验证报告PR: #4436 feat(prompt): enhance system prompts with global reasoning discipline and iterative planning 1. 总体结论
合并建议:✅ 可以合并。纯 prompt 增强,零逻辑变更,所有测试通过。 2. 验证矩阵
3. 改动概述(4 文件,+58/-5)3.1
|
| 子节 | 内容 |
|---|---|
| The Loop | 3 步循环:Explore(读文件)→ Capture findings(即时整合)→ Ask the user(歧义时提问) |
| First Turn | 快速扫描关键文件建立初步理解,不要等详尽探索完毕才与用户互动 |
| Asking Good Questions | 4 条原则:不问能从代码中自己找到的问题、合并相关问题、聚焦用户独有的信息、按任务深度调整提问频率 |
| Planning Principles | 4 条:全局理解先于局部修改、融入现有代码模式而非发明并行模式、引用具体文件和函数路径、包含验证计划 |
| When to Converge | 计划完成标准:所有歧义已解决,涵盖改动范围、文件、复用代码、验证方式 |
评价:高质量。引入结构化的 Explore→Capture→Ask 循环,强调"不要读完第一个文件就跳到实现",与 Plan Mode 的设计意图完全一致。
3.2 agent.ts — Agent 工具 Prompt 编写指南(+15 行)
新增 "Writing the prompt" 子节:
- 5 条 briefing 原则:解释目的和原因、讲清已探索/已排除的信息、提供足够背景以让 agent 做判断、需要简短响应时明确说、lookup 提供精确目标 / investigation 提供真实问题
- "Terse command-style prompts produce shallow, generic work."
- "Never delegate understanding":不要写 "based on your findings, fix the bug",要自己理解后写出具体文件路径、约束、范围
- agent 返回前不要编造/预测其结果
评价:借鉴 Claude Code 的 agent prompting 理念,核心原则 "Never delegate understanding" 直击 agent 使用中的常见反模式。
3.3 todoWrite.ts — 任务规划纪律(+6 行)
新增 "Planning with Todos" 子节:
- 在分解为低级编辑前,用 todo 反映有意义的高层方法(investigate → design → implement → verify)
- 当新信息改变对任务的理解时,更新 todo 结构而非仅追加零散项
评价:简洁直接,鼓励全局视角的任务管理。
3.4 prompts.test.ts — 快照更新(+2/-2)
仅更新了 Plan Mode system reminder 的快照,与 prompt 文本改动一致。
4. 测试结果
prompts.test.ts 60 passed (含快照更新)
agent.test.ts 75 passed
todoWrite.test.ts 28 passed
─────────────────────────────
Total 163 passed, 0 failed
Duration 6.69s
零失败。所有快照已更新为新的 prompt 文本。
5. Token 成本评估
PR 描述称 ~500 tokens 总增幅。经审阅:
- Plan Mode reminder:从 ~50 tokens → ~350 tokens(+300)
- Agent tool description:新增 ~100 tokens
- Todo tool description:新增 ~50 tokens
总计约 +450 tokens。这是仅 Plan Mode 激活时才注入的 system reminder(prompts.ts 的 getPlanModeSystemReminder),对正常对话无影响。Agent 和 Todo 的 prompt 增加仅在 tool description 中,按需加载。
结论:token 开销可接受,换取的规划质量提升合理。
6. 合并建议
✅ 可以合并。
纯 prompt 增强,零逻辑变更,所有测试通过。Prompt 文本质量好 — 结构化、可操作、与现有指令无冲突。Plan Mode 的迭代规划工作流填补了之前 "1. answer 2. present plan" 过于简化的缺口。"Never delegate understanding" 原则直面 agent 使用中的真实痛点。
7. 复现指引
# 进入 tmux 会话
tmux attach -t pr4436
# 验证环境
cd /tmp/pr4436-test
# 运行受影响的测试
cd packages/core
npx vitest run --no-coverage \
src/core/prompts.test.ts \
src/tools/agent/agent.test.ts \
src/tools/todoWrite.test.ts
# 查看 prompt 文本变更
git diff $(git merge-base origin/main HEAD)..HEAD -- packages/core/src/core/prompts.ts报告由 Claude Opus 4.7 在本地 tmux 上验证,作为维护者 merge 决策参考。
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
The standout here is the plan-mode iterative workflow. The explore → capture → ask loop, the "engage the user early rather than exploring exhaustively first" guidance, and the convergence checklist are a genuine improvement to plan mode — and since it only applies in plan mode, the blast radius is contained. I'd be glad to see it land.
One thing before it does: the branch is a couple of weeks old and CI is red on an unrelated Windows UI test (sticky-todo footer height) that's since been reworked on main; a rebase should clear it — it's not anything in this PR.
Verdict
Comment — the plan-mode workflow is the keeper; I'd like to see it land once a rebase clears the unrelated CI failure.
tanzhenxin
left a comment
There was a problem hiding this comment.
Follow-up to my earlier review (which endorsed the plan-mode iterative workflow). Re-checked at 513932e72:
- ✅ CI is now fully green — the unrelated
windows-lateststicky-todo test flake that was the only outstanding blocker has cleared, so no rebase is needed. - The Plan Mode "Iterative Planning Workflow" remains the genuinely valuable, non-duplicated contribution here, and there's no internal prompt contradiction.
One thing to handle at merge time (not a blocker for this PR): the agent.ts additions (## Writing the prompt + the "Never delegate understanding" paragraph) overlap with your own #4574, which is a strict superset of them. Whichever of the two lands second should drop the duplicated block so the Agent tool description doesn't carry the guidance twice.
Approving.
Summary
prompts.ts(Doing Tasks section + Plan Mode Iterative Workflow) andagent.ts("Writing the prompt" / "Never delegate understanding" section).Validation
prompts.test.ts60 passed,todoWrite.test.ts28 passed,agent.test.ts75 passedcd packages/core && npx vitest run src/core/prompts.test.ts— should pass clean. Then read the diff for prompt text quality.Scope / Risk
Testing Matrix
Testing matrix notes:
Linked Issues / Bugs