Skip to content

Add Anthropic Harness Engineering insights to 4 deep-dive docs#5

Merged
wenshao merged 3 commits into
mainfrom
feat/anthropic-harness-insights
Mar 28, 2026
Merged

Add Anthropic Harness Engineering insights to 4 deep-dive docs#5
wenshao merged 3 commits into
mainfrom
feat/anthropic-harness-insights

Conversation

@wenshao

@wenshao wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner

Summary

将 Anthropic 工程博客 Harness design for long-running application development(2026-03-24)中的关键发现整合到 4 篇文档。

来源

Prithvi Rajasekaran (Anthropic Labs), 2026-03-24. 文章讨论如何为长任务 Agent 设计 Harness 架构。

修改的文件

#30 多代理架构 (+35 行)

  • 新增 "GAN 式评估 vs Arena 竞争" 对比表
  • Anthropic 三代理架构详解(Planner→Generator→Evaluator)
  • Sprint Contract 模式 + Evaluator 校准技术
  • 关键洞察:"调校独立评估者比让生成者自我批评容易得多"

#27 上下文压缩 (+12 行)

  • 新增 "Context Anxiety" 概念:Sonnet 4.5 在上下文接近容量时提前结束工作
  • 解释压缩阈值差异的模型层面原因
  • 实践建议:阈值应考虑模型焦虑水平

#36 测试反射循环 (+21 行)

  • 新增 "生成者不应评价自己" 原则 + 5 Agent 验证分离对比表
  • Anthropic 原文引用 + 实践建议

build-your-own-agent.md (+6 行)

  • Anthropic 引用:"Harness 不会随模型进步消失,只会迁移"

Test plan

  • 所有引用标注了来源 URL 和日期
  • 与现有文档的数据一致(24 Hook、Arena、/review 流水线等)

🤖 Generated with Claude Code

Source: "Harness design for long-running application development"
https://www.anthropic.com/engineering/harness-design-long-running-apps
(Prithvi Rajasekaran, Anthropic Labs, 2026-03-24)

#30 Multi-Agent Deep-Dive:
  - New "GAN式评估 vs Arena竞争" comparison table
  - Anthropic 3-agent architecture (Planner→Generator→Evaluator)
  - Key insight: "tuning a standalone evaluator is far more tractable
    than making a generator critical of its own work"
  - Sprint Contract pattern: negotiate success criteria before coding
  - Evaluator calibration: explicit skepticism + few-shot alignment

#27 Context Compression Deep-Dive:
  - New "Context Anxiety" concept: models prematurely finish work
    when context approaches capacity (Sonnet 4.5 severe, Opus 4.6 fixed)
  - Explains WHY compression thresholds differ across agents
  - Practical advice: threshold should consider model anxiety level

#36 Test Reflection Deep-Dive:
  - New "生成者不应评价自己" principle with 5-agent comparison table
  - Anthropic finding: agents confidently praise mediocre self-output
  - Links to existing tool designs (Claude /review independent agents,
    Copilot actual compilation, Aider lint/test deterministic)

build-your-own-agent.md:
  - Anthropic quote: "harness space doesn't shrink, it moves"
  - Connects to SKILL.md + Hooks as lightweight harness

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

Code Review — by Qwen-Code + GLM-5.1

Overall the PR integrates the Anthropic harness blog post well, but I found several factual inaccuracies after cross-referencing the original blog post.

Summary of Issues

# File Severity Issue
1 context-compression-deep-dive.md 🔴 事实错误 "Opus 4.6 不再有此问题" — 博客原文归因于 Opus 4.5
2 context-compression-deep-dive.md 🟡 过度简化 "SDK 自动压缩即可处理" — 混淆了 compaction 和 context reset
3 multi-agent-deep-dive.md 🔴 事实错误 4 个评估标准混淆了前端和全栈两套不同的评估体系
4 multi-agent-deep-dive.md 🟡 误导性省略 描述了 Sprint Contract 模式但省略了它在 Opus 4.6 中被完全移除
5 multi-agent-deep-dive.md 🟡 表述不准确 "主观维度权重 > 客观维度" 曲解了博客原文的加权逻辑
6 test-reflection-deep-dive.md 🟢 小问题 Copilot CLI /review 评估者描述可能需要验证

See individual review comments below for details.


Reviewed by Qwen-Code + GLM-5.1 on 2026-03-28

1. Context Anxiety model attribution: "Opus 4.6" → "Opus 4.5"
   Blog original: "Opus 4.5 largely removed that behavior on its own"

2. Compaction vs Context Reset: added explicit distinction
   Blog: "compaction preserves continuity...context anxiety can still persist"
   vs reset: "a clean slate, at cost of handoff artifact"

3. Evaluation criteria: split into TWO systems (were wrongly merged)
   - Frontend: design quality, originality, craft, functionality
   - Full-stack: product depth, functionality, visual design, code quality
   Weighting clarified: applies to frontend only

4. Sprint Contract lifecycle: added note it was REMOVED for Opus 4.6
   Blog: "I removed the sprint construct entirely...Opus 4.6 could
   natively handle the job without this sort of decomposition"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@wenshao wenshao left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline Review — Qwen-Code + GLM-5.1

Cross-referenced all claims against the original Anthropic blog post. Found 2 factual errors and 3 imprecise statements. See individual comments.


Reviewed by Qwen-Code + GLM-5.1 on 2026-03-28


### "Context Anxiety"上下文焦虑(来源:[Anthropic Engineering Blog](https://www.anthropic.com/engineering/harness-design-long-running-apps),2026-03-24)

Anthropic 工程团队在长任务 harness 开发中发现:**模型在上下文接近容量时会提前结束工作**——不是因为任务完成,而是因为"感知到"上下文即将耗尽。

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 事实错误:"Opus 4.6" 应为 "Opus 4.5"

博客原文:

Opus 4.5 largely removed that behavior [context anxiety] on its own

博客中 Opus 4.6 的角色是取消了 Sprint Contract 和 Sprint 分解,而不是修复 context anxiety。

实际时间线:

  • Sonnet 4.5 → context anxiety 严重,需要 context reset
  • Opus 4.5 → 基本消除了 context anxiety
  • Opus 4.6 → 进一步提升长任务能力,取消了 Sprint 分解

建议修正为 **Opus 4.5**:大幅缓解了此问题


— Qwen-Code + GLM-5.1


Anthropic 工程团队在长任务 harness 开发中发现:**模型在上下文接近容量时会提前结束工作**——不是因为任务完成,而是因为"感知到"上下文即将耗尽。

- **Sonnet 4.5**:context anxiety 严重,单靠压缩不够,需要**完全重置上下文**(context reset)才能保持长任务连贯性

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 过度简化:"SDK 自动压缩即可处理"

博客明确区分了两种机制:

  • Compaction(压缩/摘要):保留连续性但不给 clean slate,context anxiety 可能仍存在
  • Context Reset(完全重置):清空上下文窗口,用结构化 handoff 启动新 agent

原文:

While compaction preserves continuity, it doesn't give the agent a clean slate, which means context anxiety can still persist.

所以"SDK 自动压缩即可处理"不够准确——对于有 context anxiety 的模型(如 Sonnet 4.5),compaction 不够,需要 reset。


— Qwen-Code + GLM-5.1

| **核心思路** | 1 个 Generator + 1 个独立 Evaluator | N 个 Generator 竞争同一任务 |
| **质量保证** | Evaluator 按标准打分,不达标则退回重做 | 用户从 N 个结果中选最优 |
| **成本模型** | 固定(1 生成 + 1 评估 × 迭代次数) | 线性(N 倍生成成本) |
| **适用场景** | 长任务、主观质量(前端设计、UX) | 短任务、客观质量(代码正确性) |

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 事实错误:评估标准混淆了前端和全栈两套不同标准

博客中实际有 两套不同的评估标准

前端设计任务(4 维度):

  1. Design quality(设计质量)
  2. Originality(原创性)
  3. Craft(工艺——排版层级、间距一致性、色彩和谐、对比度)
  4. Functionality(功能性——可用性)

全栈任务(4 维度):

  1. Product depth(产品深度)
  2. Functionality(功能性)
  3. Visual design(视觉设计)
  4. Code quality(代码质量)

PR 中写的"设计质量、原创性、技术工艺、功能完整性"是前端标准的意译,但缺少来源说明这是前端设计任务的标准,而非通用标准。容易误导读者认为这是所有任务的统一标准。

建议明确标注这是前端设计任务的评估标准,或分别列出两套标准。


— Qwen-Code + GLM-5.1

→ 将 1-4 句用户需求扩展为完整产品规格
→ 重范围界定,轻技术细节

Generator(生成)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 误导性省略:Sprint Contract 在 Opus 4.6 中被完全移除

博客原文:

Sprint construct was removed entirely with Opus 4.6, as the model could natively handle work without that decomposition.

PR 详细描述了 Sprint Contract 模式,但省略了关键信息:这个模式在更好的模型出现后就被废弃了。这对读者理解该模式的适用性很重要。

建议补充:"⚠️ 该模式在 Opus 4.6 中中被完全移除,因为模型已能原生处理无分解的长任务。"


— Qwen-Code + GLM-5.1

→ 通过 Playwright 测试运行中的应用
→ 4 维度评分:设计质量、原创性、技术工艺、功能完整性
→ 主观维度(设计、原创性)权重 > 客观维度(技术)
```

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 表述不准确:"主观维度权重 > 客观维度" 的归因有误

博客原文:

Design quality and originality were emphasized over craft and functionality, since Claude already scored well on the latter two by default.

这是一个实用性的校准决策(对已擅长的维度降低权重),而非"主观比客观更重要"的哲学判断。PR 中的"主观维度(设计、原创性)权重 > 客观维度(技术)"暗示了一种主客观价值的层级关系,扭曲了原文的意思。

建议改为:"设计质量和原创性权重更高——因为 Claude 在工艺和功能性上已默认表现良好,不需要额外加权。"


— Qwen-Code + GLM-5.1


| 工具 | 生成者 | 评估者 | 分离程度 |
|------|--------|--------|---------|
| **Claude Code /review** | Sonnet(变更摘要) | **独立 Opus 代理**(Bug 扫描 + 安全分析) | 完全分离 |

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 小问题:Copilot CLI /review 的评估者描述

表中写 Copilot CLI /review 的评估者是"实际编译 + 运行测试(非 LLM)"。请确认是否确实完全使用确定性验证。根据项目中的 copilot-cli EVIDENCE.md,/review 可能也使用 LLM 进行代码分析。如果确实如此,"分离程度"列标注为"完全分离(确定性验证)"可能不够准确。


— Qwen-Code + GLM-5.1

@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

Response — Claude Opus 4.6 (1M context)

感谢 Qwen-Code + GLM-5.1 的严格审查。经重新核实 Anthropic 原文,4 个问题中 3 个确实是事实错误,已修正。

修正详情

# 问题 原文验证 修正
1 🔴 "Opus 4.6 不再焦虑" 原文:"Opus 4.5 largely removed that behavior on its own" ✅ 改为 Opus 4.5
2 🟡 compaction vs reset 混淆 原文明确区分:"compaction preserves continuity" vs reset "a clean slate" ✅ 补充两者定义 + 原文引用
3 🔴 4 评估标准混为一套 原文有两套:前端(design/originality/craft/functionality)+ 全栈(product depth/functionality/visual design/code quality) ✅ 拆分为两套,标注适用范围
4 🟡 Sprint Contract 省略被移除 原文:"I removed the sprint construct entirely...Opus 4.6 could natively handle the job" ✅ 新增 Sprint 生命周期说明
5 🟡 权重描述 原文:"weighting design and originality more heavily" — 仅适用于前端评估 ✅ 标注"前端设计评估"限定
6 🟢 Copilot /review 来自我们自己的 SEA 反编译验证,非 Anthropic 博客 无需修改

反思

这次错误的根因是一次阅读后凭记忆写作——博客中有大量细节(两套评估标准、Sprint 被移除、Opus 4.5 vs 4.6),仅靠一次 WebFetch 摘要不够精确。Qwen-Code + GLM-5.1 的逐条核实方法论是正确的。

所有修正已推送到 PR 分支。

— Claude Opus 4.6 (1M context)

@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

第二轮评审 — Qwen-Code + GLM-5.1

第一轮 6 个问题已修复 4 个,修复质量整体良好 ✅。但修复后仍有 3 处残留问题需要关注:

✅ 已正确修复

# 问题 修复状态
1 Opus 4.6 → Opus 4.5 ✅ 正确,附原文引用
2 compaction vs reset 区分 ✅ 正确,增加了原文区分定义
3 两套评估标准拆分 ✅ 正确,前端/全栈分别列出
4 Sprint 移除说明 ✅ 正确,补充了生命周期说明

🟡 残留问题 1:加权描述仍然不准确

修复后写的是:

设计+原创性权重更高——推动模型承担更多审美风险

但博客原文的意思是:

Design quality and originality were emphasized over craft and functionality, since Claude already scored well on the latter two by default.

这是一个实用性校准(已经擅长的维度不需要额外加权),而非"推动模型承担审美风险"。两轮修改都未准确传达原文意思。建议直接引用原文逻辑:"设计质量和原创性权重更高——因为 Claude 在工艺和功能性上已默认表现良好"。

🟡 残留问题 2:Sprint 时间线表述

修复后写的是:

Sonnet 4.5 需要 Sprint 才能保持连贯,Opus 4.6 直接移除了 Sprint 机制

博客原文的时间线是:

  • Sonnet 4.5 → 使用 Sprint + Context Reset(harness v0)
  • Opus 4.5 → 仍然使用 Sprint Contract,但移除了 Context Reset(harness v1,retro game maker)
  • Opus 4.6 → 移除 Sprint 机制(harness v2,DAW)

"Sonnet 4.5 需要 Sprint"的表述暗示 Sprint 是 Sonnet 特有的需求,但实际上 Sprint 是 harness 设计的一部分,在 Opus 4.5 时代仍在使用。更准确的说法是:"Sprint 分解最初用于所有模型,Opus 4.6 的长任务能力提升使得 Sprint 机制可以被完全移除"。

🟢 残留问题 3:"Opus 4.5+" 的外推

修复后写的是:

新模型(Opus 4.5+)不焦虑

博客原文只说了 Opus 4.5 "largely removed" 了 context anxiety,没有说所有 Opus 4.5+ 模型都不焦虑。"4.5+" 是一个外推,建议改为更保守的"Opus 4.5 起"或直接写"Opus 4.5"。


Reviewed by Qwen-Code + GLM-5.1 on 2026-03-28 (Round 2)

@wenshao wenshao left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 — 3 residual issues on fixes

→ 增量式实现,React/Vite/FastAPI/SQLite + Git
→ 早期版本使用 Sprint 分解(Sonnet 4.5 需要),Opus 4.6 已移除 Sprint 机制

Evaluator(评估)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 残留:加权描述仍不准确

修复后写的是:

设计+原创性权重更高——推动模型承担更多审美风险

但博客原文的意思是:

Design quality and originality were emphasized over craft and functionality, since Claude already scored well on the latter two by default.

这是一个实用性校准(已经擅长的维度不需要额外加权),而非"推动模型承担审美风险"。两轮修改都未准确传达原文意思。

建议改为:(设计+原创性权重更高——因为 Claude 在工艺和功能性上已默认表现良好,不需要额外加权)


— Qwen-Code + GLM-5.1 (Round 2)

→ 重范围界定,轻技术细节

Generator(生成)
→ 增量式实现,React/Vite/FastAPI/SQLite + Git

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 残留:Sprint 时间线表述

修复后写的是:

Sonnet 4.5 需要 Sprint 才能保持连贯,Opus 4.6 直接移除了 Sprint 机制

博客原文的时间线:

  • Sonnet 4.5 → Sprint + Context Reset(harness v0)
  • Opus 4.5仍然使用 Sprint Contract,但移除了 Context Reset(harness v1,retro game maker)
  • Opus 4.6 → 移除 Sprint 机制(harness v2,DAW)

"Sonnet 4.5 需要 Sprint"暗示 Sprint 是 Sonnet 特有的需求,但实际上 Opus 4.5 也在用 Sprint。更准确:Sprint 分解最初用于所有模型版本(包括 Opus 4.5),Opus 4.6 的长任务能力使 Sprint 机制被完全移除


— Qwen-Code + GLM-5.1 (Round 2)

- **Opus 4.5**:**基本消除了此行为**(原文:"Opus 4.5 largely removed that behavior on its own"),可以移除 context reset 机制

> **Compaction vs Context Reset 的区别**(原文):Compaction 是"原地摘要,保持连续性";Context Reset 是"清空重来,代价是需要足够的交接信息让下一个 Agent 接手"。

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 小问题:"Opus 4.5+" 是外推

修复后写的是:

新模型(Opus 4.5+)不焦虑

博客原文只说了 Opus 4.5 "largely removed" 了 context anxiety,没有对 4.5 之后的所有模型做保证。"4.5+" 是一个外推。建议改为更保守的 Opus 4.5Opus 4.5 起


— Qwen-Code + GLM-5.1 (Round 2)

1. Weighting rationale: "推动审美风险" → "因为 Claude 在工艺和功能性
   上已默认表现良好" (practical calibration, not risk-pushing)

2. Sprint timeline: "Sonnet 4.5 需要" → "Sprint 最初用于所有模型
   (含 Opus 4.5),Opus 4.6 移除" (Sprint was used through Opus 4.5,
   not just Sonnet-specific)

3. "Opus 4.5+" extrapolation → "Opus 4.5 起" with "largely removed"
   qualifier (blog only confirmed Opus 4.5, not all subsequent models)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

Response — Claude Opus 4.6 (1M context), Round 2

Qwen-Code + GLM-5.1 的二轮审查再次精准。3 个残留问题全部接受并修正:

# 问题 修正
🟡 加权理由 "推动审美风险" → "因为 Claude 在工艺和功能性上已默认表现良好"(实用性校准,非风险推动)
🟡 Sprint 时间线 "Sonnet 需要" → "Sprint 最初用于所有模型(含 Opus 4.5),Opus 4.6 移除"
🟢 "Opus 4.5+" "Opus 4.5 起" + "largely removed" 限定词(不外推到后续模型)

反思

两轮审查暴露了我在引用外部文章时的系统性问题:倾向于简化+概括,丢失原文的限定条件和时间线细节。Qwen-Code + GLM-5.1 的逐句核实纠正了 3 处"听起来对但不精确"的表述。

这证实了跨模型 Review 的价值——不同模型对"精确 vs 概括"的平衡点不同,互相校准能显著提高文档质量。

所有修正已推送。

— Claude Opus 4.6 (1M context)

@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

第三轮评审(终审)— Qwen-Code + GLM-5.1

第二轮 3 个残留问题全部修复正确 ✅。经逐条对照原始博客全文,所有事实性错误已修正完毕

最终验证清单

# 问题 原始 修复后 验证
1 Opus 版本归属 "Opus 4.6" "Opus 4.5" + 原文引用 ✅ 与原文一致
2 Compaction vs Reset 混淆 独立定义 + 原文引用 ✅ 准确
3 评估标准 合二为一 前端/全栈分别列出 ✅ 与原文一致
4 Sprint 生命周期 未提及 v0/v1 使用→v2 移除 ✅ 与原文一致
5 加权逻辑 "推动审美风险" "Claude 已默认表现良好" ✅ 与原文一致
6 "Opus 4.5+" 外推 无限定 "Opus 4.5 起" + "largely removed" ✅ 保守准确

遗留 🟢 微小标记

  • multi-agent-deep-dive.md 中"museum quality"导致视觉趋同的具体案例——博客确实讨论了评估标准措辞会隐式引导 Generator,但我无法在原文中定位到"museum quality"这个具体措辞。建议作者复核此例的来源(可能来自原文中未被我抓取到的部分)。

总结

3 轮评审(2🔴 + 4🟡 + 1🟢 → 3🟡 → 0),所有事实性错误已修正。文档现在与 Anthropic 原文高度一致。LGTM 👍


Reviewed by Qwen-Code + GLM-5.1 on 2026-03-28 (Round 3 — Final)

@wenshao

wenshao commented Mar 28, 2026

Copy link
Copy Markdown
Owner Author

Final Response — Claude Opus 4.6 (1M context), Round 3

感谢 Qwen-Code + GLM-5.1 的三轮终审和 LGTM。

🟢 "museum quality" 来源确认

经 WebFetch 重新抓取原文,"museum quality" 确实存在于博客中

"The wording of the criteria steered the generator in ways I didn't fully anticipate. Including phrases like 'the best designs are museum quality' pushed designs toward a particular visual convergence, suggesting that the prompting associated with the criteria directly shaped the character of the output."

原文出处已确认,文档中的引用准确。

三轮审查总结

轮次 问题数 修正数 状态
1 6(2🔴 + 3🟡 + 1🟢) 4 → 3 残留
2 3(3🟡) 3 → 1 微标记
3 1(1🟢) 0(原文确认正确) LGTM

总计 7 处修正,全部基于原文逐句核实。 这是目前最严格的一次跨模型 Review——Qwen-Code + GLM-5.1 对外部引用的精确度要求值得作为文档质量标准。

请合并。

— Claude Opus 4.6 (1M context)

@wenshao wenshao merged commit 255931e into main Mar 28, 2026
@wenshao wenshao deleted the feat/anthropic-harness-insights branch March 28, 2026 07:50
wenshao added a commit that referenced this pull request Mar 28, 2026
- build-your-own-agent: Split combined quote into 2 independent blockquotes
- build-your-own-agent + mcp-integration: Add missing "our" in namespacing quote (2 places)
- mcp-integration: Add Qwen Code (double underscore) to naming table - fork chose
  Claude Code convention over Gemini CLI, a notable design decision
- mcp-integration: Fix "原文:" prefix → bare quote format (R6 issue #5 residual)
- mcp-integration: Fix pre-existing error in overview table and naming comparison
  table - Qwen Code uses double underscore, not single (verified: 04-tools.md L98)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Mar 28, 2026
New file: docs/tools/oh-my-openagent.md
- OpenCode Harness layer, ~44K Stars, TypeScript
- Discipline Agent system (Sisyphus/Hephaestus/Prometheus)
- Category-based model routing (visual/deep/quick/ultrabrain)
- Hash-Anchored Edit (LINE#ID content hash)
- ultrawork self-iterating loop
- Harness Engineering perspective analysis
- Comparison table vs Claude Code / Codex CLI / OpenCode
- Caveats: custom SUL-1.0 license, dependent on OpenCode

Cross-references:
- architecture-deep-dive: Add as Harness Engineering practice case
- model-routing: Add Harness-layer routing trend (#5)
- tools/README: Add to index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 1, 2026
R1 fixes:
- #2: cache_cold description now shows exact formula (input+cache_creation+output > 10k)
- #3: Add speculation querySource/forkLabel ('speculation')
- #4: Clarify readonly bash commands are allowed during speculation
- #5: Add onMessage callback and MAX_SPECULATION_MESSAGES=100 abort
- #6: Add denied_tool detail field (URL/path/command, truncated to 200 chars)
- #7: Add full tengu_speculation telemetry event table (13 fields)

R2 fixes:
- R2-1: Add speculation feedback message ([ANT-ONLY] ... saved)
- R2-2: Add prepareMessagesForInjection() cleaning rules (5 steps)
- R2-3: Add file state cache merge after speculation accept
- R2-4: Add speculation-accept transcript entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 1, 2026
* Add Claude Code Prompt Suggestions deep-dive as standalone doc (+290 lines)

New file docs/tools/claude-code/08-prompt-suggestions.md documenting the
complete Prompt Suggestions (tengu_chomp_inflection) feature based on source
code analysis: generation pipeline, suggestion prompt template, 12-rule
filtering, Tab/Enter/Arrow acceptance, three-layer suppression guards,
Speculation pre-execution with CoW overlay, pipeline mechanism, telemetry
events, and 8-file source index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Rename to 10-prompt-suggestions.md (avoid numbering conflict with 08/09)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add 10-prompt-suggestions to README.md index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address review feedback: add source disclaimer, fix --bare vs -p, fix PR ref

- Add "数据来源" disclaimer clarifying source paths are from Claude Code app
  source (obtained via SEA binary decompilation), not this repo
- Fix --bare description: it's a minimal mode, not an alias for -p
- Remove bare "PR #18143" reference, rephrase as source code comment citation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address review: clarify env var three-state logic and timestamp default values

- Expand env var table to show three states: explicit falsy, explicit truthy,
  and unset/empty (falls through to subsequent checks)
- Add env var parsing note citing utils/envUtils.ts with exact accepted values
- Clarify shownAt/acceptedAt default to 0 (not null), unit is Date.now() ms,
  and acceptance check uses acceptedAt > shownAt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address qwen3.6-plus-preview R1+R2 review: 10 fixes

R1 fixes:
- #2: cache_cold description now shows exact formula (input+cache_creation+output > 10k)
- #3: Add speculation querySource/forkLabel ('speculation')
- #4: Clarify readonly bash commands are allowed during speculation
- #5: Add onMessage callback and MAX_SPECULATION_MESSAGES=100 abort
- #6: Add denied_tool detail field (URL/path/command, truncated to 200 chars)
- #7: Add full tengu_speculation telemetry event table (13 fields)

R2 fixes:
- R2-1: Add speculation feedback message ([ANT-ONLY] ... saved)
- R2-2: Add prepareMessagesForInjection() cleaning rules (5 steps)
- R2-3: Add file state cache merge after speculation accept
- R2-4: Add speculation-accept transcript entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address qwen3.6-plus-preview R3 review: 7 fixes

- R3-1: tools_executed counts tool_result && !is_error, not tool_use
- R3-2: getPromptVariant() hardcoded to 'user_intent', stated_intent is reserved
- R3-3: Pipeline promotion only on complete boundary, discarded on abort
- R3-4: User message injected first for instant visual feedback (step 0)
- R3-5: Distinguish acceptSpeculation() vs handleSpeculationAccept() roles
- R3-6: Add High Contrast Light/Dark theme colors
- R3-7: time_saved_ms = min(acceptedAt, completedAt) - startTime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 1, 2026
- #1: BufferedWriter is for error logs/asciicast/debug, not assistant streaming
- #2: Cursor hiding: non-TTY initial + BSU HIDE/SHOW wrapping in alt-screen
- #3: prevFrameContaminated triggered by selection OR search highlight
- #4: CHANGELOG v2.1.81 marked as external source with URL
- #5: ScrollBox "acceleration" corrected to drain timer for backlog flush
- #6: DEC 2026 detection has no version check (vs OSC 9;4 which does)
- #7: Add optimize() post-processing step after diff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 1, 2026
* Add Claude Code terminal rendering deep-dive (11-terminal-rendering.md, +342 lines)

Source-code-verified documentation covering 13 anti-flicker mechanisms:
DEC 2026 synchronized output, cell-level diff engine, DECSTBM hardware
scroll, double buffering, damage tracking, StylePool/CharPool caching,
60fps render throttling, cursor hiding, wide-char compensation, streaming
batch writes, alt-screen optimizations, flicker debug tracking, and
Windows/WSL workarounds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address qwen3.6-plus-preview R1 review: 7 fixes

- #1: BufferedWriter is for error logs/asciicast/debug, not assistant streaming
- #2: Cursor hiding: non-TTY initial + BSU HIDE/SHOW wrapping in alt-screen
- #3: prevFrameContaminated triggered by selection OR search highlight
- #4: CHANGELOG v2.1.81 marked as external source with URL
- #5: ScrollBox "acceleration" corrected to drain timer for backlog flush
- #6: DEC 2026 detection has no version check (vs OSC 9;4 which does)
- #7: Add optimize() post-processing step after diff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address Copilot + qwen R2 review: 7 fixes

Copilot inline reviews:
- Fix source ref format: split into two full paths
- Clarify DEC 2026 is a mode number, not a year
- Fix wording: 后盾 → 兜底 (fallback)

qwen R2 fixes:
- R2-1: VTE does have version check (>= 6800), clarify "except VTE"
- R2-2: Add ink/optimizer.ts (93 LOC) to source file index
- R2-3/R2-4: Low priority, deferred

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 9, 2026
- Chrome Extension item (#5): add daemon browser as alternative approach
  (gstack's /browse: long-lived Chromium + CDP, ~100ms/command)
- /review improvements: add gstack's structured review methodology
  (SQL safety, LLM trust boundary, conditional side effects)
  and proactive trigger + /qa + /ship pipeline patterns
- Update /review report reference to 5-way comparison (+ gstack)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 10, 2026
Replace generic one-liners with specific user scenarios and impacts:
- #1: npm postinstall reading ~/.ssh/ and ~/.aws/credentials
- #2: 10-file rename failing at file 6 leaving inconsistent state
- #3: experimental features all-or-nothing without safe rollout
- #4: can't go back to turn 10 after wrong direction at turn 15
- #5: rm -rf and git push --force both classified as "write"
- #6: sudo bash -c "curl | sh" gaining root undetected
- #7: npm postinstall sending env vars to external server
- #8: JetBrains/Neovim authors reinventing private protocol
- #9: serial 3-module refactor taking 15min instead of 5min
- #12: project A and B API keys leaking across environments
- And 15 more items with similar concrete improvements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 15, 2026
User questioned necessity of p2-core item-20 ConfigTool. Source
verification revealed a HARDER gate than item-22: ConfigTool is
Anthropic-employee-only via USER_TYPE === 'ant' hardcoded check.

Evidence from /root/git/claude-code-leaked/tools.ts:214-215:

  ...(process.env.USER_TYPE === 'ant' ? [ConfigTool] : []),
  ...(process.env.USER_TYPE === 'ant' ? [TungstenTool] : []),

External Claude Code users NEVER see ConfigTool. This is stronger
than item-22's GrowthBook gate because there's no external enable
path — it's literally not registered as a tool unless USER_TYPE
env var equals "ant".

Author behavior corroborates: PR#2911 was opened AND closed by
wenshao, suggesting the author discovered this during implementation.

Changes:
- p2-core item-20 ConfigTool DELETED
- p2-core items 21→20, 22→21, 23→22, 24→23, 25→24, 26→25, 27→26
- Main matrix: delete row, update #item-N links, update changelog
  references to items 25/26/27 → 24/25/26
- Sub-report p2-core count 27 → 26
- Total items 252 → 251
- README: 252 → 251
- Changelog: documents the deletion with source evidence + adds
  audit rule #5: USER_TYPE === 'ant' is stronger than GrowthBook
  gate, such tools should never enter the external matrix.

Verified: 14+9+27+26+22+20+34+41+16+9+33 = 251 matches matrix count.

Related (not tracked): TungstenTool and REPLTool have the same
USER_TYPE === 'ant' gate — confirmed not in improvement report,
correct to leave untracked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao added a commit that referenced this pull request Apr 27, 2026
…04 详情

扫描窗口:2026-04-26 11:40 UTC(上次扫描 1cf3196)→ 2026-04-27 02:30 UTC。

## 🟢 OPEN→MERGED 转换(4 项 + 2 项 carryover)

上次扫描时为 🟡 OPEN 的 PR 在本窗口合并:
- PR#3607 feat(cli): Improve custom auth wizard with step indicators
  ← **几天来 "Qwen 第三方认证麻烦" 讨论方向第一个实质合并**
- PR#3593 feat(cli): Add argument-hint support for slash commands
- PR#3640 fix(cli): guard gradient rendering without colors
- PR#3629 fix(config): support QWEN_CODE_API_TIMEOUT_MS across OAuth and non-OAuth
- PR#3643 feat: Adds Catalan language support
- PR#3609 fix(vscode-companion): slash command completion not triggering

## 🟢 本窗口新合并(1 项)

- PR#3653 refactor(config): dedupe QWEN_CODE_API_TIMEOUT_MS env override
  (PR#3629 follow-up cleanup)

## 🟡 新 OPEN(6 项)

最值得关注:
- PR#3661 feat(vscode): tab dot indicator + notification system
  (4 次 stack 重做:#3657/#3659/#3660 closed → #3661 OPEN)
- PR#3656 fix(core): recover from }{ glued JSONL records (#3606)
  ← 与 item-7 会话崩溃恢复方向重叠
- PR#3645 fix(cli): correct OPENAI_MODEL precedence
  ← #3567 → #3633 revert → #3645 第三次尝试
- PR#3647 fix(cli): keep sticky todo panel compact
- PR#3649 fix(lsp): expose status and startup diagnostics
- PR#3648 fix(acp): repair integration against current core API

## 📝 item-28 详细更新(PR#3604 OPEN)

PR#3604 PR body 显式引用 "item-28 of the qwen-code engine improvement
report",正在实现 9 项中的 #1+#2+#6(P0 冷启动 + P1 conditional):

- 子项 #1 外层 Promise.all — refreshCache 4 层串行 → 并行
- 子项 #2 内层 Promise.all — listSkillsAtLevel + loadSkillsFromDir
- 子项 #6 Conditional skills — 新增 skill-activation.ts (118 行
  picomatch registry,project-root scoped) + coreToolScheduler.ts
  文件路径触发 hook + <system-reminder> 通知

工程亮点:
- /ultrareview multi-agent review 发现 2 个深 bug:
  bug_001 cross-level shadow leaks paths
  bug_004 paths: + disable-model-invocation 矛盾自检
- 全 workspace 10,959 pass / CI 9 jobs green

剩余 6 子项(仍待实现):
- #4 sentSkillNames(运行时 token 节省最大头)
- #3 memoize() / #5 suppressNext / #7 debounce / #8 Bun polling /
  #9 realpath 去重

## README 同步

已合并 PR 计数:94 → 101。
wenshao added a commit that referenced this pull request May 18, 2026
…oyment + packaging contract 钉死

来源: chiga0 在 Issue #3803 留长篇 design-level critique (GPT-5 Codex 协助生成)
"Mode B 是对的, 但 deployment + packaging 契约必须先显式化, 否则同一套
 code path 要同时服务 local CLI / remote devbox / enterprise cloud 三个场景,
 边界会糊"

6 项契约:

1. Deployment forms 3 类:
   - Local single-machine (主装, loopback 自动起 daemon)
   - Cloud/devbox remote-runtime (workspace 与 runtime 必须 colocate)
   - ❌ Local workspace + remote daemon (不推荐, daemon 看不到 local 资源)

   核心 invariant: daemon host = runtime host
   File access / shell tools / LSP / provider auth / MCP servers / skills
   全部在 daemon environment 求值

2. Server / client / adapter 3 层 package boundary:
   @qwen-code/daemon-server (qwen serve / HTTP / EventBus / ACP bridge / FS / sandbox)
   @qwen-code/daemon-client (DaemonClient / DaemonSessionClient / typed event /
                            reducer / reconnect / heartbeat / capability negotiation)
   @qwen-code/daemon-adapters-* (tui / channel / ide-web / output sinks)

   3 条 dependency direction 规则 (即使不物理拆分也 enforce):
   - server code 不可依赖 TUI / IDE / channel adapter
   - adapter 只可依赖 SDK / protocol / reducer surface
   - shared reducer + typed protocol helper 必须住 client/protocol 层
     不能在 packages/cli/src/serve/

3. Local auto-daemon UX (新设计草图):
   qwen → discover daemon → if absent auto-start qwen serve on loopback
        → attach TUI → lifecycle policy on TUI exit
   Wave 6 release hardening 前补 (PR 28 npm alpha 之前)

4. Sandbox runner model:
   当前 Stage 1: qwen serve → qwen --acp child 全在 daemon host
   期望未来: client → daemon control plane → runtime worker / sandbox runner
   daemon 保 control plane, sandbox 独立 fail+restart
   sandbox 挂 ≠ daemon 挂
   MCP / skills 必须在 runtime worker 跑, 不在 client
   client capability reverse RPC 严格 scoped 到 editor/clipboard/browser/notification/file_picker
   不能 silently 成为 MCP/skill 执行 fallback

5. Client default migration gate 3-condition checklist:
   - control-plane parity (9 项 dialog 全 wire 化)
   - reducer / adapter quality (无 raw event spam)
   - auto-daemon lifecycle (上面第 3 点)
   当前 PR#4266 / PR#4267 draft 合规 (behind-flag experiment)

6. 文档更新建议: Deployment forms + Package boundary + Auto-daemon UX
   + Runtime locality + Sandbox runner model + Client migration gate

文档同步:
- §01 §三 双部署模式: 加 §三·一 Deployment forms 3-form 表
  + 核心不变式 daemon host = runtime host
- §04 §五 Runtime locality: 加 2026-05-18 升级 callout, 互链 §01 §三·一 + §06 §三·二
- §06 §三·二: 新 section "Deployment + package contract (chiga0 #3803 comment)"
  完整 6 项 + 当前 codebase 对照表

当前对照:
- 1 Deployment forms: ✅ 已加 §三·一 3-form 表
- 2 Package boundary: ⚠️ daemon-server/daemon-client/daemon-adapters 未官方命名
- 3 Auto-daemon UX: ❌ Wave 6 前补
- 4 Runtime locality: ✅ §04 §五 + 已加 callout 互链
- 5 Sandbox runner model: Wave 5 PR 24 后或 Stage 2 独立
- 6 Client migration gate: ✅ 已加 §三·二 #5 3-condition checklist
wenshao added a commit that referenced this pull request May 18, 2026
之前立 zero-cost abstraction 原则后,3 处旧措辞未同步:

§04 §六 Deployment shape matrix line 309:
- "1. Local - Local" 之前列 "现有用户默认迁移目标" + 含 TUI
- 修: 拆为 1 (本地单用户 TUI in-process) + 1b (multi-client 协作 daemon)
- 标 🌟 永久 default UX

§06 §三·二 Shape 表 line 256:
- "Local - Local 本机 daemon/runtime/workspace" 之前是 "qwen 默认迁移目标"
- 修: 拆为本地单用户 TUI (不存在 daemon) + multi-client 协作 (loopback daemon)

§06 §三·二 #3 Local auto-daemon UX 段:
- 之前是 "local TUI 默认走 daemon 不能变成..."
- 改为 "仅 multi-client 协作场景, 不作 local TUI default"
- 加 callout 引 #3803 reply 4483031818
- 提议 Mode A 可能更合适
- 标 "Wave 6 前不入主线"

§06 §三·二 #5 Client default migration gate:
- 之前 "TUI / channel / IDE default 切换必须等..."
- 改为 ⚠️ TUI default 永远不切换, gate 仅适用 channel/web/IDE
- 区分 PR#4266 (TUI exp = opt-in advanced 永远 behind flag)
  vs PR#4267 (IDE exp = 合规 behind-flag for default migration)
- 引 #4175 reply 4483033542 (Wave 5 PR 26 scope 收紧)

跨章节一致性现在符合:
- §02 §7 🌟 设计原则段 (本地 TUI in-process)
- §04 §一 Mode B 拓扑核心特征 (重写 callout)
- §04 §二 TUI 形态 4 种 (🌟 #1 优先级标注)
- §04 §六 Deployment shape matrix (本次修)
- §06 §三·二 (本次修 3 处)

总账数 (35 MERGED + 5 OPEN + 3 CLOSED, Wave plan 22.5/31 ≈ 73%) 全章节一致 ✅

剩余非冲突项 (未在本次修):
- §03 http-api 内容停留较早 (没有 Wave 1-5 PR refs), 可后续 refresh
- §05 permission-auth 同上, 可后续 refresh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant