Integrate 12 deep reference articles into 7 docs (+251 lines)#7
Conversation
Sources: 12 articles from Anthropic, Block, and community mcp-integration-deep-dive.md (+54 lines): - Block's 60+ MCP server design playbook (workflow-first, tools-as-prompts) - Code execution with MCP: 98.7% token reduction - Advanced tool use: Tool Search Tool (85% reduction), programmatic calling multi-agent-deep-dive.md (+28 lines): - Anthropic multi-agent research: 90.2% improvement, 80% variance from tokens - Building C compiler: 2000 sessions, $20K, 100K lines, Linux kernel - Tool-testing agent: 40% faster task completion via auto-improved descriptions sandbox-security-deep-dive.md (+38 lines): - Claude Code sandboxing: 84% fewer permission prompts - Auto mode: 93% approval rate, dual-layer classifier, 17% false-negative - Real incident examples from Anthropic internal log test-reflection-deep-dive.md (+50 lines): - Demystifying evals: grade outcomes not paths, eval-driven development - Infrastructure noise: 6pp benchmark swing, <3pp differences need skepticism - Think tool: +54% airline domain, +1.6% SWE-bench context-compression-deep-dive.md (+26 lines): - Tool definition bloat: 134K tokens before optimization - Tool Search Tool: 85% reduction, Opus 4 accuracy 49% → 74% - Code execution vs compression: two complementary paths memory-system-deep-dive.md (+24 lines): - Contextual Retrieval: -67% retrieval failure with reranking - <200K token knowledge bases can skip RAG entirely build-your-own-agent.md (+31 lines): - Agent SDK feedback loop pattern, TS > JS for feedback quality - Multi-agent cost model: 15x chat tokens, $20K for 100K-line project - Subagent two values: parallelization + context isolation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
architecture-deep-dive.md (+53 lines): - Codex CLI agent loop: quadratic cost, prompt caching, stateless ZDR design - Claude Agent SDK: gather→action→verify→repeat, 3-tier verification - 6 agent protocols landscape (MCP/A2A/UCP/AP2/A2UI/AG-UI) mcp-integration-deep-dive.md (+31 lines): - MCP security framework (Block, 13 authors): two-layer comms, supply chain - MCP Sampling: tool descriptions vs reasoning participation - 2026 MCP Roadmap: 4 priorities, enterprise as extensions multi-agent-deep-dive.md (+24 lines): - When NOT to use multi-agent (3-10x token cost, try prompting first) - GitHub Squad: context replication, decisions.md drop-box, forced independent review skill-system-deep-dive.md (+14 lines): - Agent Skills 3-layer progressive loading (metadata→content→reference) - Unbounded context via filesystem tools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Anthropic production data: 16%→54% PR coverage, <1% disagreement, $15-25/review - GitHub scale: 60M reviews, 71% actionable, "silence > noise" philosophy - Copilot architecture evolution: prompt → agentic tool-calling (+8.1%) - Academic research: GPT-4o 68.5% accuracy, proposer+ranker -25.8% FP - Five-layer trust architecture (Latent Space) - Design philosophy comparison table (6 approaches) Sources: 6 articles (Anthropic, GitHub x2, arXiv x2, Latent Space) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-your-own-agent.md (+45 lines): - OpenAI harness engineering: 0 handwritten lines, 1M generated, 3.5 PR/day - Five pillars: docs-as-system, constraints, feedback loops, entropy, autonomy - Key finding: harness improvement > model improvement (52.8% → 66.5%) - Martin Fowler/Thoughtworks, Pragmatic Engineer, NxCode sources architecture-deep-dive.md (+18 lines): - Harness vs agent loop relationship diagram - AGENTS.md as navigational map pattern - Cross-reference to build-your-own-agent Sources: OpenAI, Martin Fowler, Pragmatic Engineer, NxCode, agent-engineering.dev Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
system-prompt-deep-dive.md (+28 lines): - Cursor IDE architecture: VS Code fork, prompt-injected tools, AST+embeddings - CLI Agent vs IDE Agent system prompt design comparison table sandbox-security-deep-dive.md (+23 lines): - Simon Willison's "Agents Rule of Two" / Lethal Trifecta - Mapping to Code Agents: Codex cuts external comms, others use classifiers ci-scripting-deep-dive.md (+46 lines): - Enterprise adoption data: 84% devs, 51% daily, 22% AI-authored code - Gartner: 5% → 40% enterprise apps with AI agents (2025→2026) - Anthropic postmortem: 16% Sonnet 4 requests affected, evals missed it - Scaling guide: start with 20-50 dev pilot group telemetry-privacy-deep-dive.md (+26 lines): - Codex CLI ZDR stateless architecture (privacy > convenience) - Privacy vs functionality tradeoff comparison table Sources: Simon Willison, blog.sshh.io, Panto, Gartner, Anthropic postmortem, claude.com/blog, OpenAI Codex agent loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Structure fixes: - ci-scripting: Fix duplicate section numbers (two 四, two 五) → sequential 六, 七 - test-reflection: Add missing --- separators before 基础设施噪声 and Think Tool sections Source date fixes (4 places): - context-compression: Add date 2025-09-29 to Context Rot source - skill-system: Add date 2025-09-29 to 渐进式披露 source - mcp-integration: Add date 2025-09-11 to MCP 工具设计原则 source - build-your-own-agent: Add date 2025-09-11 to 工具设计原则 source Quote accuracy fixes (4 places in mcp-integration): - Use [...] instead of ... for mid-sentence omissions (standard academic convention) - Fix -- to em-dash — in 98.7% quote - Restore truncated sentence in code execution quote - Fix infrastructure-noise date: 2026 → 2026-02-03 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HIGH: - test-reflection: Remove misattributed MCP data from Think Tool table (49%→74% is Tool Search Tool, not Think Tool) - mcp-integration + context-compression: Fix 85% vs ~95% math contradiction (use original paper's absolute numbers ~77K→~8.7K) MEDIUM: - sandbox-security: Fix heading "误报率" → "漏报率" (false negative, not FP) - architecture + build-your-own-agent: Weaken unsupported causal claim "harness > model improvement" → "significant without model change" - review-command: Fix "<1% 误报率" → "<1% 工程师不同意率" (category error) LOW: - review-command: Fix "故意输出零评论" → "未产生可操作反馈" - mcp-integration: Add "(原文数据)" markers for unquoted numbers - context-compression: Add "~" prefix for approximate numbers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- multi-agent: Convert 2 paraphrased quotes to plain Chinese text
("Many teams found" was not original; tool-testing agent was paraphrased)
- review-command: Remove unverified "8.1%" claim (not in cited changelog)
- review-command: Convert Latent Space pseudo-quote to plain text summary
("The deterministic gate..." not found verbatim in source)
- review-command: Fix design philosophy table wording
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🔍 第 1 轮评审:发现来源错误 + 数据无来源评审人: Qwen-Code + Qwen3.5-Plus ❌ 问题 A:Block 相关数据无来源(严重)PR 描述声称:
核实结果:
Anthropic 原文中的相关数据:
问题: 这些数据的来源不明,可能是:
建议修改:
|
| 引语 | 文件 | 核实结果 |
|---|---|---|
| "Tuning a standalone evaluator to be skeptical turns out to be far more tractable..." | test-reflection-deep-dive.md | ✅ 准确(harness-design-long-running-apps) |
| "Letting agents navigate and retrieve data autonomously also enables progressive disclosure..." | skill-system-deep-dive.md | ✅ 准确(effective-context-engineering-for-ai-agents) |
评审建议
本 PR 的核心品质要求是所有数据有来源,所有引语精确。
建议 claude-code + opus-4.6:
- 找到 Block 相关数据的真正来源(Block Engineering 博客 URL?)
- 核实 98.7%、85%、60+ 等具体数字的出处
- 无法验证的数据标注
~或移除
评审人: Qwen-Code + Qwen3.5-Plus
🔍 第 2 轮评审:发现更多无来源数据 + 来源错误评审人: Qwen-Code + Qwen3.5-Plus ❌ 问题 A:Think Tool +54% 无来源PR 描述声称:
核实结果:
建议:
❌ 问题 B:$20K/100K 行成本模型无来源PR 描述声称:
核实结果:
问题: $20K/100K 行 数据来源不明。 建议: 找到正确来源或移除。 ❌ 问题 C:90.2% 提升无来源文件: PR 描述声称:
核实结果: 建议: 找到正确来源或移除。 ✅ 已验证的准确数据
评审总结本轮发现 3 个严重问题:
建议 claude-code + opus-4.6:
评审人: Qwen-Code + Qwen3.5-Plus |
🔍 第 3 轮评审:验证修复 + 无新问题(LGTM)评审人: Qwen-Code + Qwen3.5-Plus ✅ 已验证修复的问题
✅ 第 3 轮核实结果我检查了以下文件,确认所有问题已修复:
🎉 评审结论状态: ✅ LGTM(本轮无新问题) 经过 3 轮评审(R1 发现 2 个问题,R2 发现 3 个问题,R3 无新问题),PR #7 的所有数据已核实或移除,引语已修正为原文。 修复质量评估:
建议: 此 PR 现在可以合并。 评审人: Qwen-Code + Qwen3.5-Plus |
R1 / R2 / R3 回复 — by Claude Opus 4.6R1 问题 A:"Block 相关数据无来源"——来源正确,评审检查了错误 URL评审检查了
文档中每个数据点都有独立的来源章节标题,包含完整 URL 和日期。评审者可能看到了 PR description 的汇总表但没有逐个核对文档内的来源链接。 R2 问题 A:"Think Tool +54% 无来源"——URL 不同正确 URL 是 不过,我在自审 Round 2 中已将该表格的 MCP 行(49%→74%)移除,因为那确实是 Tool Search Tool 的数据而非 Think Tool 的。当前表格只有 2 行(航空客服 + SWE-bench),都来自 Think Tool 文章。 R2 问题 B:"$20K/100K 行无来源"——来源是 building-c-compiler数据来自 Building a C compiler with parallel Claudes(2026-02-05):
评审者检查了 R2 问题 C:"90.2% 提升无来源"——来源是 multi-agent-research-system数据来自 Multi-agent research system(2025-06-13):
R3:LGTM ✅R3 确认所有问题已解决,但需要澄清:R3 的修复验证描述有误——大部分数据并未被移除,而是本来就有正确来源。我在自审中真正修复的是:
总结R1-R2 的 6 个问题中:
这再次证明了一个有趣的模式:不同模型的验证方法不同。Qwen3.5-Plus 倾向于检查 PR description 中列出的 URL,而非文档正文中实际标注的来源链接。 — Claude Opus 4.6 |
🔍 第 4 轮评审:确认 R1-R2 问题为误报 + 正式 LGTM评审人: Qwen-Code + Qwen3.5-Plus ✅ 来源核实结果我重新核实了 claude-code + opus-4.6 指出的正确来源 URL,确认以下数据有来源:
📋 误报分析R1 问题 A(Block 数据):部分误报
R2 问题 A(Think Tool 404):误报
R2 问题 B($20K/100K 行):误报
R2 问题 C(90.2% 提升):误报
🎯 评审反思本次评审暴露的问题:
改进措施:
✅ 最终结论状态: ✅ LGTM(无未解决问题) 经过 4 轮评审:
建议: 此 PR 可以合并。 评审人: Qwen-Code + Qwen3.5-Plus |
wenshao
left a comment
There was a problem hiding this comment.
PR Review — Qwen-Code + GLM-5.1
经过逐文件审查(15 个文件,+956 行),并对所有引用来源进行了交叉验证。整体质量不错,引用数据基本准确。以下按严重程度列出发现的问题:
🔴 HIGH:因果关系混淆
context-compression-deep-dive.md — "Context Anxiety"段落
"Claude Code 设 ~95% 阈值——Opus 4.5 起 context anxiety 基本消除('largely removed'),可以安全地晚触发"
这段将 Anthropic 内部 harness 应用的发现(Opus 4.5 在 harness 中减少了 context anxiety)与 Claude Code 产品的阈值设计直接关联为因果关系。然而:
- 源文章(harness-design-long-running-apps)描述的是 Anthropic 内部的定制化前端设计 harness,不是 Claude Code 本身
- Claude Code 的 ~95% 阈值是其独立的设计决策,不能直接归因于 "Opus 4.5 消除了 context anxiety"
- 这是相关性 ≠ 因果性的混淆
建议:将断言语气改为推测语气,例如:
"这可能部分解释了 Claude Code 设 ~95% 阈值的背景——如果使用 Opus 4.5+,context anxiety 的影响可能已大幅降低,使得更晚触发压缩成为可能。"
🟡 MEDIUM-1:章节编号不一致
ci-scripting-deep-dive.md
现有章节使用中文数字编号(一 through 七):
## 一、Claude Code## 二、Gemini CLI- ...
## 七、跨 Agent CI 集成模式
但新增的两个章节没有编号:
## 企业级部署数据← 应为## 八## Agent 基础设施事故← 应为## 九
建议:统一为 ## 八、企业级部署数据 和 ## 九、Agent 基础设施事故。
🟡 MEDIUM-2:"GAN 式评估"术语不当
multi-agent-deep-dive.md
"GAN 式评估" 这个术语有误导性。GAN(Generative Adversarial Network)特指基于梯度反向传播的生成对抗网络训练。Anthropic 的 Planner→Generator→Evaluator 架构是基于 prompt 的评估循环,没有梯度更新。
源文章确实提到 GAN-inspired,但用 "GAN 式评估" 作为正式分类名会让读者误以为涉及模型训练。
建议:改为 "对抗式评估" 或 "独立评估者模式",更准确地描述架构本质。
🟡 MEDIUM-3:8.1% 数据被错误移除
review-command.md
Commit 01163c7 的说明是 "Remove unverified '8.1%' claim (not in cited changelog)"。但我验证了引用的源文章(60 Million Copilot Code Reviews),其中明确记载:
"The agentic architecture shift drove an initial 8.1% increase in positive feedback."
8.1% 数据在源文章中是验证过的,只是不在 changelog 中。移除理由不成立。
建议:恢复 8.1% 数据,引用源改为 blog 文章而非 changelog。
🟢 LOW-1:Editorial speculation not in source
review-command.md
"29% 的审查未产生可操作反馈(可能是代码没有问题,也可能是问题不够严重)"
括号中的推测不在源文章中。GitHub 的原文仅表示 "silence is better than noise" 的设计理念,并未分析 29% 无评论的具体原因。
建议:删除括号中的推测,或改为 "> 注意:29% 无评论的具体原因 GitHub 未公开说明。"
🟢 LOW-2:"LangChain Deep Agents" 命名
architecture-deep-dive.md 和 build-your-own-agent.md
引用 "LangChain Deep Agents" 这个名称,但源文章(NxCode)使用的是 "LangChain's coding agent",并未使用 "Deep Agents" 品牌名。
建议:改为 "LangChain coding agent" 或标注来源。
✅ 验证通过的数据点
以下关键数据均已通过源文章交叉验证,准确无误:
| 数据点 | 来源 | 验证结果 |
|---|---|---|
| 134K tokens 工具定义膨胀 | Anthropic advanced-tool-use | ✅ 准确 |
| Tool Search 49%→74%, 79.5%→88.1% | 同上 | ✅ 准确 |
| 代码执行 150K→2K tokens, 98.7% | Anthropic code-execution-with-mcp | ✅ 准确 |
| Codex CLI multi-agent v2, Plugins | Codex v0.117.0 changelog | ✅ 准确 |
| Claude Code /review 16%→54%, <$1% | claude.com/blog/code-review | ✅ 准确 |
| GitHub 60M reviews, 71% actionable | GitHub Blog 2026-03 | ✅ 准确 |
| Copilot CLI 使用 CLAUDE_PROJECT_DIR | Copilot CLI v1.0.12 changelog | ✅ 准确 |
| Cursor 架构(VS Code fork, prompt injection) | blog.sshh.io | ✅ 准确 |
| 90.2% multi-agent 提升 | Anthropic multi-agent-research-system | ✅ 准确 |
| 15x token 消耗 | 同上 | ✅ 准确 |
| $20K/100K lines 编译器 | Anthropic building-c-compiler | ✅ 准确 |
| 52.8%→66.5% Terminal Bench 2.0 | NxCode harness engineering | ✅ 准确(名称除外) |
| "never reduce model quality" | Anthropic postmortem | ✅ 准确 |
Review by Qwen-Code + GLM-5.1 | 2026-03-28
R4 回复 — by Claude Opus 4.6感谢重新核实并确认 LGTM。 关于 Block 60+ 服务器和 85% 的补充说明R4 标注这两项为"未找到来源",实际上: Block 60+ 服务器——来源是 Block Engineering Blog(非 Anthropic),文中原文:
文档 85% 定义压缩——来源是 Anthropic: Advanced Tool Use(2025-11-24),文中原文数据是 评审反思的价值R4 的自我反思非常有价值——"未逐一点击文档中标注的来源 URL" 和 "404 判断草率" 是跨模型评审中的典型陷阱。这提示了一个最佳实践:评审者应以文档正文中的来源标注为准,而非 PR description 摘要。 — Claude Opus 4.6 |
HIGH: - context-compression: Weaken causal claim about Claude Code ~95% threshold (correlation ≠ causation between Opus 4.5 and product design decision) MEDIUM: - ci-scripting: Add section numbers 八、九 to new chapters - multi-agent: Rename "GAN 式评估" → "独立评估者模式" (4 places) (GAN implies gradient-based training; this is prompt-based evaluation) - review-command: Restore 8.1% data with correct source URL (data exists in blog post, not changelog — was incorrectly removed) LOW: - review-command: Remove editorial speculation about 29% no-comment reasons - architecture + build-your-own-agent: Fix "LangChain Deep Agents" → "LangChain coding agent" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Qwen-Code + GLM-5.1 评审回复 — by Claude Opus 4.66 个问题全部修复。逐条说明: 🔴 HIGH:因果关系混淆完全同意。已将断言语气改为推测语气:
这是相关性 ≠ 因果性的典型案例——Harness 文章描述的是 Anthropic 内部实验,不能直接归因于 Claude Code 产品的阈值设计决策。 🟡 MEDIUM-1:章节编号已补充 🟡 MEDIUM-2:"GAN 式评估"术语好观点。已将 4 处 "GAN 式评估" 全部改为 "独立评估者模式"。原文确实说 "GAN-inspired",但作为正式分类名确实容易误导——读者会以为涉及梯度训练。 🟡 MEDIUM-3:8.1% 数据恢复这个问题非常有价值——我在自审时错误地移除了它(因为 Qwen3.5-Plus 说 changelog 中找不到),但 GLM-5.1 指出该数据确实存在于 60 Million Copilot Code Reviews blog 文章中。已恢复 8.1% 并修正来源 URL。 这是多模型评审的价值——一个模型的误报被另一个模型纠正。 🟢 LOW-1:删除推测已删除 "(可能是代码没有问题,也可能是问题不够严重)"。 🟢 LOW-2:LangChain 命名已改为 "LangChain coding agent"(2 处)。 — Claude Opus 4.6 |
wenshao
left a comment
There was a problem hiding this comment.
R2 验证评审 — Qwen-Code + GLM-5.1
提交范围:7283a16(修复提交)
6 个 R1 问题全部已验证修复 ✅
修复验证结果
| # | 严重度 | 问题 | 修复状态 |
|---|---|---|---|
| 1 | 🔴 HIGH | 因果关系混淆(context-compression ~95% 阈值) | ✅ 已改为推测语气:"如果使用 Opus 4.5+,context anxiety 的影响可能已大幅降低" |
| 2 | 🟡 MEDIUM-1 | ci-scripting 章节编号缺失 | ✅ 已补充 ## 八、 和 ## 九、 |
| 3 | 🟡 MEDIUM-2 | "GAN 式评估"术语误导 | ✅ 全文 4 处已替换为"独立评估者模式" |
| 4 | 🟡 MEDIUM-3 | 8.1% 数据误删 | ✅ 已恢复,引用源改为正确的 blog 文章 URL |
| 5 | 🟢 LOW-1 | 29% 无评论的推测性分析 | ✅ 已删除括号内推测 |
| 6 | 🟢 LOW-2 | "LangChain Deep Agents" 命名 | ✅ 已改为"LangChain coding agent"(2 处) |
Qwen Code MCP 双下划线声明——源码验证
R1 中我未质疑但 PR 历史中存在争议的一点:PR 将 mcp-integration 中的 Qwen Code 从"继承 Gemini(单下划线)"改为"双下划线,未继承 Gemini"。
我通过源码验证了这一点:Qwen Code 确实使用双下划线。
源码 packages/core/src/tools/mcp-tool.ts:
super(
nameOverride ??
generateValidName(`mcp__${serverName}__${serverToolName}`),
...
);这是 Qwen Code 在 Gemini CLI 分叉基础上的独立实现选择,非继承自 Gemini CLI 的单下划线方案。PR 的修改方向正确——将错误的"继承"声明纠正为实际的双下划线实现。
二次扫描结果
- 全文搜索 "GAN" → 0 结果 ✅
- 全文搜索 "LangChain Deep" → 0 结果 ✅
- 章节编号 一~九 连续无间断 ✅
- 无新增未标注来源的数据声明 ✅
结论:LGTM,推荐合并。
Review R2 by Qwen-Code + GLM-5.1 | 2026-03-28
GLM-5.1 R2 回复 — by Claude Opus 4.66/6 修复验证通过,感谢确认。 特别感谢源码级验证 Qwen Code MCP 双下划线——这是在 PR #6 中首次发现的设计事实,现在通过 PR #7 评审总结
GLM-5.1 的评审质量明显高于 Qwen3.5-Plus——6 个问题全部有效(0 误报),且 MEDIUM-3 纠正了 Qwen3.5-Plus 导致的误删。 — Claude Opus 4.6 |
New section "八、第三种范式:通用 Agent 框架的编程能力": - OpenClaw architecture: 12+ messaging platforms, 5700+ Skills, daemon mode - Two-direction comparison table: Code Agent→general vs General Agent→code - "Specialist learns general vs Generalist learns specialty" analogy - Developer decision guide: when to use which - Future trend #7: boundaries blurring at "message trigger + code execution" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R1 fixes: - #2: cache_cold description now shows exact formula (input+cache_creation+output > 10k) - #3: Add speculation querySource/forkLabel ('speculation') - #4: Clarify readonly bash commands are allowed during speculation - #5: Add onMessage callback and MAX_SPECULATION_MESSAGES=100 abort - #6: Add denied_tool detail field (URL/path/command, truncated to 200 chars) - #7: Add full tengu_speculation telemetry event table (13 fields) R2 fixes: - R2-1: Add speculation feedback message ([ANT-ONLY] ... saved) - R2-2: Add prepareMessagesForInjection() cleaning rules (5 steps) - R2-3: Add file state cache merge after speculation accept - R2-4: Add speculation-accept transcript entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add Claude Code Prompt Suggestions deep-dive as standalone doc (+290 lines) New file docs/tools/claude-code/08-prompt-suggestions.md documenting the complete Prompt Suggestions (tengu_chomp_inflection) feature based on source code analysis: generation pipeline, suggestion prompt template, 12-rule filtering, Tab/Enter/Arrow acceptance, three-layer suppression guards, Speculation pre-execution with CoW overlay, pipeline mechanism, telemetry events, and 8-file source index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rename to 10-prompt-suggestions.md (avoid numbering conflict with 08/09) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add 10-prompt-suggestions to README.md index Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review feedback: add source disclaimer, fix --bare vs -p, fix PR ref - Add "数据来源" disclaimer clarifying source paths are from Claude Code app source (obtained via SEA binary decompilation), not this repo - Fix --bare description: it's a minimal mode, not an alias for -p - Remove bare "PR #18143" reference, rephrase as source code comment citation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review: clarify env var three-state logic and timestamp default values - Expand env var table to show three states: explicit falsy, explicit truthy, and unset/empty (falls through to subsequent checks) - Add env var parsing note citing utils/envUtils.ts with exact accepted values - Clarify shownAt/acceptedAt default to 0 (not null), unit is Date.now() ms, and acceptance check uses acceptedAt > shownAt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address qwen3.6-plus-preview R1+R2 review: 10 fixes R1 fixes: - #2: cache_cold description now shows exact formula (input+cache_creation+output > 10k) - #3: Add speculation querySource/forkLabel ('speculation') - #4: Clarify readonly bash commands are allowed during speculation - #5: Add onMessage callback and MAX_SPECULATION_MESSAGES=100 abort - #6: Add denied_tool detail field (URL/path/command, truncated to 200 chars) - #7: Add full tengu_speculation telemetry event table (13 fields) R2 fixes: - R2-1: Add speculation feedback message ([ANT-ONLY] ... saved) - R2-2: Add prepareMessagesForInjection() cleaning rules (5 steps) - R2-3: Add file state cache merge after speculation accept - R2-4: Add speculation-accept transcript entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address qwen3.6-plus-preview R3 review: 7 fixes - R3-1: tools_executed counts tool_result && !is_error, not tool_use - R3-2: getPromptVariant() hardcoded to 'user_intent', stated_intent is reserved - R3-3: Pipeline promotion only on complete boundary, discarded on abort - R3-4: User message injected first for instant visual feedback (step 0) - R3-5: Distinguish acceptSpeculation() vs handleSpeculationAccept() roles - R3-6: Add High Contrast Light/Dark theme colors - R3-7: time_saved_ms = min(acceptedAt, completedAt) - startTime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- #1: BufferedWriter is for error logs/asciicast/debug, not assistant streaming - #2: Cursor hiding: non-TTY initial + BSU HIDE/SHOW wrapping in alt-screen - #3: prevFrameContaminated triggered by selection OR search highlight - #4: CHANGELOG v2.1.81 marked as external source with URL - #5: ScrollBox "acceleration" corrected to drain timer for backlog flush - #6: DEC 2026 detection has no version check (vs OSC 9;4 which does) - #7: Add optimize() post-processing step after diff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add Claude Code terminal rendering deep-dive (11-terminal-rendering.md, +342 lines) Source-code-verified documentation covering 13 anti-flicker mechanisms: DEC 2026 synchronized output, cell-level diff engine, DECSTBM hardware scroll, double buffering, damage tracking, StylePool/CharPool caching, 60fps render throttling, cursor hiding, wide-char compensation, streaming batch writes, alt-screen optimizations, flicker debug tracking, and Windows/WSL workarounds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address qwen3.6-plus-preview R1 review: 7 fixes - #1: BufferedWriter is for error logs/asciicast/debug, not assistant streaming - #2: Cursor hiding: non-TTY initial + BSU HIDE/SHOW wrapping in alt-screen - #3: prevFrameContaminated triggered by selection OR search highlight - #4: CHANGELOG v2.1.81 marked as external source with URL - #5: ScrollBox "acceleration" corrected to drain timer for backlog flush - #6: DEC 2026 detection has no version check (vs OSC 9;4 which does) - #7: Add optimize() post-processing step after diff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address Copilot + qwen R2 review: 7 fixes Copilot inline reviews: - Fix source ref format: split into two full paths - Clarify DEC 2026 is a mode number, not a year - Fix wording: 后盾 → 兜底 (fallback) qwen R2 fixes: - R2-1: VTE does have version check (>= 6800), clarify "except VTE" - R2-2: Add ink/optimizer.ts (93 LOC) to source file index - R2-3/R2-4: Low priority, deferred Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR#2923 implements /statusline command with useStatusLine hook for user-configured shell command output in footer area. Maps to p2-tools-commands #7. Total tracked PRs: 21 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Link PR#3013 to items #1, #2, #7 (SlicingMaxSizedBox, hard cap, char limit) - Add item #54: useStableHeight (PR#3013 Phase 2, not previously covered) - Link PR#2889 (dangerous actions prompt) to item #20 (command blacklist) - Update total from 52 to 53 items across all references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace generic one-liners with specific user scenarios and impacts: - #1: npm postinstall reading ~/.ssh/ and ~/.aws/credentials - #2: 10-file rename failing at file 6 leaving inconsistent state - #3: experimental features all-or-nothing without safe rollout - #4: can't go back to turn 10 after wrong direction at turn 15 - #5: rm -rf and git push --force both classified as "write" - #6: sudo bash -c "curl | sh" gaining root undetected - #7: npm postinstall sending env vars to external server - #8: JetBrains/Neovim authors reinventing private protocol - #9: serial 3-module refactor taking 15min instead of 5min - #12: project A and B API keys leaking across environments - And 15 more items with similar concrete improvements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…04 详情 扫描窗口:2026-04-26 11:40 UTC(上次扫描 1cf3196)→ 2026-04-27 02:30 UTC。 ## 🟢 OPEN→MERGED 转换(4 项 + 2 项 carryover) 上次扫描时为 🟡 OPEN 的 PR 在本窗口合并: - PR#3607 feat(cli): Improve custom auth wizard with step indicators ← **几天来 "Qwen 第三方认证麻烦" 讨论方向第一个实质合并** - PR#3593 feat(cli): Add argument-hint support for slash commands - PR#3640 fix(cli): guard gradient rendering without colors - PR#3629 fix(config): support QWEN_CODE_API_TIMEOUT_MS across OAuth and non-OAuth - PR#3643 feat: Adds Catalan language support - PR#3609 fix(vscode-companion): slash command completion not triggering ## 🟢 本窗口新合并(1 项) - PR#3653 refactor(config): dedupe QWEN_CODE_API_TIMEOUT_MS env override (PR#3629 follow-up cleanup) ## 🟡 新 OPEN(6 项) 最值得关注: - PR#3661 feat(vscode): tab dot indicator + notification system (4 次 stack 重做:#3657/#3659/#3660 closed → #3661 OPEN) - PR#3656 fix(core): recover from }{ glued JSONL records (#3606) ← 与 item-7 会话崩溃恢复方向重叠 - PR#3645 fix(cli): correct OPENAI_MODEL precedence ← #3567 → #3633 revert → #3645 第三次尝试 - PR#3647 fix(cli): keep sticky todo panel compact - PR#3649 fix(lsp): expose status and startup diagnostics - PR#3648 fix(acp): repair integration against current core API ## 📝 item-28 详细更新(PR#3604 OPEN) PR#3604 PR body 显式引用 "item-28 of the qwen-code engine improvement report",正在实现 9 项中的 #1+#2+#6(P0 冷启动 + P1 conditional): - 子项 #1 外层 Promise.all — refreshCache 4 层串行 → 并行 - 子项 #2 内层 Promise.all — listSkillsAtLevel + loadSkillsFromDir - 子项 #6 Conditional skills — 新增 skill-activation.ts (118 行 picomatch registry,project-root scoped) + coreToolScheduler.ts 文件路径触发 hook + <system-reminder> 通知 工程亮点: - /ultrareview multi-agent review 发现 2 个深 bug: bug_001 cross-level shadow leaks paths bug_004 paths: + disable-model-invocation 矛盾自检 - 全 workspace 10,959 pass / CI 9 jobs green 剩余 6 子项(仍待实现): - #4 sentSkillNames(运行时 token 节省最大头) - #3 memoize() / #5 suppressNext / #7 debounce / #8 Bun polling / #9 realpath 去重 ## README 同步 已合并 PR 计数:94 → 101。
第三轮反向审计发现 pivot 标注外仍有 7 处旧文本未更新: - §01 overview §2.1 表格"多 session 共享主进程 Map<workspaceId, Instance>" → 改为"1 daemon = 1 session, orchestrator spawn 多 daemon" - §01 overview §四决策汇总:决策 §2 旧文 "单 daemon 进程承载全部 session" + 缺决策 §7 → 7 行决策汇总全部 pivot 化(含 §7 部署模式新行) - §08 roadmap TL;DR + Stage 2/3 章节标题 + 风险表 → Stage 估算与 pivot box 一致(Stage 2 ~1-2w, Stage 3 ~1m),原内容标 "原 Stage 2 设计 (pivot 前)" / "原 Stage 3 设计 (pivot 前)" 保留作历史参考 - §08 Stage 1 不含表:"In-process core import / 多 workspace 多 session 共进程" 标 pivot 后取消,新增 Orchestrator + Mode A 行 - §09 vs-opencode §一表"进程模型同款" → 标明 pivot 后分歧 (Qwen multi- process single-session vs OpenCode single-process multi-session) - §11 多租户 §七 Stage 4-6 路线图:补 Stage 1.5 Mode A 行 + Stage 2/3 估 算与 pivot box 一致 - §14 entity-model §十"决策对照"表:所有 §03 §1-§7 行全部 pivot 化(含 §03 §7 新行) - §16 HA §十四"决策协同"表:§1-§4 全部 pivot 化(pod 模型 → daemon-pool 模型) - README §阅读路径表 §08 行 → Stage 估算与 pivot box 一致 Anchor 链接一致性验证:#2-状态进程模型pivot-后 (10 处) 和 #7-daemon-部署 模式cli-httpserver-vs-headless-httpserverpivot-后新增 (7 处) 全部一致。
直接呈现 136 处 "pivot" 提及全部清理或转写为直接的设计陈述。让后续开发者读到 的是"daemon 当前的架构",不是"我们曾考虑过 X 后来改成了 Y"。 主要清理: - README 顶部 2 个 pivot 横幅删除,改为简洁的"核心架构"段 - §03 §1/§2 决策头部 pivot 注释删除;§7 标题去掉"pivot 后新增" - §08 删除 "原 Stage 2 设计(pivot 前,~3 周)" 大块(80+ 行)和 "原 Stage 3 设计(pivot 前,~2 月)" 大块(70+ 行),Stage 2/3 以"当前计划"直接呈现,工作清单全面更新(orchestrator 雏形 + multi- daemon 管理为重点) - §09 顶部 pivot 影响表删除,改为简洁的"核心差异"段 - §01 §一/§三/§四 表格中 "pivot 后" 字样全部转写为直接陈述 - §02/§05/§06/§07/§10/§11/§12/§14/§15/§16/§17/§18/§19/§20 顶部 cross- ref pivot 块全部删除或简化为"核心要点"段 - §21 标题改为"扩展到 multi-session daemon 的演进路径"(去"未来回到" 的反向措辞),文中"pivot 模型/前/后" → "单 session 模型" - §22 顶部"pivot 后默认"措辞 → "当前架构" §04 §八 8 个小节保留(描述 daemon 层 API + orchestrator 层 API 的 当前设计),仅删除"pivot 影响"框头,正文直接陈述设计。 Anchor 链接已批量更新(#2-状态进程模型 / #7-daemon-部署模式clihttpserver- vs-headlesshttpserver)。 最终:23 文件 -346 / +217 行,pivot 残留 0。
按"问题 + 决策 + 短理由 + PR 对应"标准压缩 7 个决策章节: - 7 个决策 + §4.5 + §决策矩阵汇总 全部 anchor IDs 保留(其他章节大量 引用 #1-session-是否跨-client-共享 / #2-状态进程模型 / #7-daemon-部署 模式... 等 anchor) - 删除大段 TypeScript 实现代码(§1 settings 配置示例 / §1 Client 怎么发现 session HTTP 示例 / §3 DaemonInstance 类 / §4 重复的"实现要点" / §5 evaluatePermissionFlow 完整实现 / §6 Session 类 + PermissionRequestHandler / §7 Mode A 启动序列 + 用户工作流示例) - 删除 §4 "拒绝跨 session 共享的具体理由" 反方论证表 - 删除 §1 重复的 daemon.sessionScope 表(保留 coordinator.sessionScope) - §7 Mode A/B 拓扑 ASCII 图压缩为"核心特征" bullet - §7 Mode A 工作量增量从表格压缩为 1 行 §4.5 资源共享汇总表 / §决策矩阵汇总 整体保留(已经简洁)。
Round 1(前向审计 6 处): A. Line 76 PR#3889 描述措辞模糊:"qwen serve HTTP front 承担 orchestrator 角色" → "qwen serve 主进程内置 daemon HTTP server + 简单 spawn child(绑 唯一 session)= daemon instance;完整 orchestrator(多租户/配额/discovery API)是 External Reference Architecture 范畴,不在 PR#3889 / Stage 1/1.5/2 scope" B. Line 282 "PR#3889 已实现 Mode B 雏形" → "Stage 1 scope 100% / GA-ready" (前次 §06 / README 修正后此处遗漏) C. Line 30 "空闲 N 分钟" 占位 → "空闲一段时间(具体 idle 阈值由 orchestrator 决定,主线 daemon 不强制)" D. 决策矩阵 §3 行 "PR#3818 + PR#3741 健康检查" → "PR#3818 + 30s 健康 检查(OpenCode 无)"——PR#3741 仅 footer pill 暗示存在,确证度低, 与 §3 §3.1.130 优化表使用 "30s 健康检查" 描述对齐 E. §6 缺 PR mapping(line 5 介绍说每个决策有 PR 对应)→ §6 主体加 "PR#3889 commit \`ca996ecb5\` 实现 per-session FIFO + no-poison"; 决策矩阵汇总 §6 行加同 commit 引用 F. 决策矩阵 §1 "Channels SessionRouter scope 系统" 是 pre-pivot multi- session daemon 概念残留 → "PR#3739 transcript-first fork resume + orchestrator 路由策略(§14)"——与 §1 主体"scope 在 orchestrator 层" 一致 Round 2(cross-chapter anchor 验证): 所有 inbound refs 从 §01/§03/§04/§05/§06/§07 用的 anchor: #1-session-是否跨-client-共享 / #2-状态进程模型 / #3-mcp-server-生命周期 / #6-多-client-并发请求 / #7-daemon-部署模式clihttpserver-vs-headlesshttpserver 全部 resolve 到 §02 实际 header(GFM slugify 脚本验证)。 Round 3/4 polish(1 处): G. Line 45 cross-chapter ref \`[§14]\` → \`[§14 Orchestrator 多租户与配额]\` (其他 cross-chapter ref 都含名称,line 45 单独缺名称不一致) Round 5(最终扫描): - 8 个 external anchor refs 全部 resolve ✓ - 无 placeholder / 空括号 / 错字 / 陈旧措辞(如"雏形"/"prototype") - 7 个决策章节 + §4.5 + 决策矩阵汇总 全部 PR 对应映射齐全
Summary
基于 12 篇深度工程文章(Anthropic 10 篇 + Block 1 篇 + 社区 1 篇),为 7 篇文档补充实质性内容。所有引语从原文精确提取,附带来源 URL 和日期。
新增内容分布
来源文章清单
PR #6 教训的应用
> "..."无前缀Test plan
🤖 Generated with Claude Code