Skip to content

feat: cache diagnostics v1 — /cache-miss-report, doctor --cache, prefix hash evidence | 缓存诊断 v1#2188

Merged
esengine merged 4 commits into
esengine:mainfrom
SivanCola:pr/1-cache-diagnostics-v1
May 29, 2026
Merged

feat: cache diagnostics v1 — /cache-miss-report, doctor --cache, prefix hash evidence | 缓存诊断 v1#2188
esengine merged 4 commits into
esengine:mainfrom
SivanCola:pr/1-cache-diagnostics-v1

Conversation

@SivanCola

Copy link
Copy Markdown
Collaborator

Summary

This PR introduces the first phase of cache diagnostics for Reasonix, providing real-time visibility into prompt cache performance. A new evidence layer captures per-turn prefix hashes and infers miss reasons, enabling developers and operators to understand why cache misses occur.

Key deliverables:

  • A new /cache-miss-report slash command (aliases: /cache-report, /cache) for in-session cache diagnostics
  • New CLI commands: reasonix doctor --cache and reasonix doctor-cache for cache-stability health checks
  • Per-turn diagnostic data stored in SessionStats for live reporting within the dashboard
  • Backward-compatible session metadata via a cacheDiagnostics field
  • i18n support for EN, zh-CN, and de
  • Comprehensive tests for all new cache diagnostic endpoints

概述

此PR为Reasonix引入了缓存诊断的第一阶段,提供对提示缓存性能的实时可见性。一个新的证据层捕获每轮的前缀哈希并推断未命中原因,使开发者和运维人员能够理解缓存未命中的根本原因。

主要交付内容:

  • 新的 /cache-miss-report 斜杠命令(别名:/cache-report/cache),用于会话内缓存诊断
  • 新的CLI命令:reasonix doctor --cachereasonix doctor-cache,用于缓存稳定性健康检查
  • 每轮诊断数据存储在 SessionStats 中,支持仪表盘内的实时报告
  • 通过 cacheDiagnostics 字段实现向后兼容的会话元数据
  • 支持EN、zh-CN、de的国际化和本地化
  • 所有新缓存诊断端点的全面测试

Changes

  • New file: src/telemetry/cache-diagnostics.ts — evidence layer for per-turn prefix hashes and miss reason inference
  • Enhanced: src/telemetry/stats.ts — store per-turn diagnostics in SessionStats
  • Enhanced: src/cli/commands/doctor.ts — add cache-stability check mode
  • Enhanced: src/cli/index.ts — register doctor --cache and doctor-cache CLI commands
  • Enhanced: src/cli/ui/slash/commands.ts — register /cache-miss-report slash command
  • Enhanced: src/cli/ui/slash/handlers/observability.ts — handle /cache-miss-report handler
  • Enhanced: src/loop.ts — integrate cache diagnostics into the agent loop
  • Enhanced: src/loop/types.ts — add CacheDiagnosticEntry type
  • Enhanced: src/memory/runtime.ts — track cache diagnostics in runtime state
  • Enhanced: src/memory/session.ts — backward-compatible cacheDiagnostics field
  • Enhanced: src/transcript/log.ts — log cache diagnostic events
  • Enhanced: src/i18n/EN.ts, src/i18n/de.ts, src/i18n/zh-CN.ts — i18n keys
  • Enhanced: .gitignore — add agent.md, bugs.md, todo.md
  • New test: tests/cache-diagnostics.test.ts — comprehensive cache diagnostic tests
  • Enhanced: tests/doctor-json.test.ts, tests/slash.test.ts, tests/ui-slash-suggestions.test.tsx

变更

  • 新文件: src/telemetry/cache-diagnostics.ts — 每轮前缀哈希和未命中原因推断的证据层
  • 增强: src/telemetry/stats.ts — 在SessionStats中存储每轮诊断数据
  • 增强: src/cli/commands/doctor.ts — 添加缓存稳定性检查模式
  • 增强: src/cli/index.ts — 注册 doctor --cachedoctor-cache CLI命令
  • 增强: src/cli/ui/slash/commands.ts — 注册 /cache-miss-report 斜杠命令
  • 增强: src/cli/ui/slash/handlers/observability.ts — 处理 /cache-miss-report 处理器
  • 增强: src/loop.ts — 将缓存诊断集成到代理循环中
  • 增强: src/loop/types.ts — 添加CacheDiagnosticEntry类型
  • 增强: src/memory/runtime.ts — 在运行时状态中跟踪缓存诊断
  • 增强: src/memory/session.ts — 向后兼容的cacheDiagnostics字段
  • 增强: src/transcript/log.ts — 记录缓存诊断事件
  • 增强: src/i18n/EN.ts, src/i18n/de.ts, src/i18n/zh-CN.ts — 国际化键值
  • 增强: .gitignore — 添加agent.md、bugs.md、todo.md
  • 新测试: tests/cache-diagnostics.test.ts — 全面的缓存诊断测试
  • 增强: tests/doctor-json.test.ts, tests/slash.test.ts, tests/ui-slash-suggestions.test.tsx

Test Plan

  1. Run the unit test suite:
    npx vitest run tests/cache-diagnostics.test.ts tests/doctor-json.test.ts tests/slash.test.ts tests/ui-slash-suggestions.test.tsx
    
  2. Verify the /cache-miss-report slash command works interactively in a dev session
  3. Run reasonix doctor --cache and verify JSON and text output formats
  4. Run reasonix doctor-cache and confirm it defaults to cache-only mode
  5. Confirm backward compatibility: sessions without cacheDiagnostics load without errors
  6. Verify i18n strings render correctly in EN, zh-CN, and de locales

测试

  1. 运行单元测试套件:
    npx vitest run tests/cache-diagnostics.test.ts tests/doctor-json.test.ts tests/slash.test.ts tests/ui-slash-suggestions.test.tsx
    
  2. 在开发会话中交互式验证 /cache-miss-report 斜杠命令
  3. 运行 reasonix doctor --cache 并验证JSON和文本输出格式
  4. 运行 reasonix doctor-cache 并确认其默认仅运行缓存模式
  5. 确认向后兼容性:没有 cacheDiagnostics 的会话可以无错误加载
  6. 验证i18n字符串在EN、zh-CN和de语言环境中正确渲染

Verification

npm run lint
npm run typecheck
npx vitest run tests/cache-diagnostics.test.ts tests/doctor-json.test.ts tests/slash.test.ts tests/ui-slash-suggestions.test.tsx

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0d128dff8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/loop.ts Outdated
@SivanCola SivanCola force-pushed the pr/1-cache-diagnostics-v1 branch from e0d128d to f520f05 Compare May 28, 2026 19:16
@SivanCola SivanCola changed the title feat: cache diagnostics v1 — /cache-miss-report, doctor --cache, prefix hash evidence feat: cache diagnostics v1 — /cache-miss-report, doctor --cache, prefix hash evidence | 缓存诊断 v1 May 28, 2026
@SivanCola

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6426b6823f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/ui/edit-tool-gate.ts Outdated
@SivanCola

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6212f5e0ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/loop.ts Outdated
Comment thread src/mcp/registry.ts
@esengine

Copy link
Copy Markdown
Owner

Before I can review this series (#2188#2192) I need the branch structure fixed — right now it's not reviewable as five PRs.

The problem: all five PRs (pr/1-cache-diagnostics-v1pr/5-ssh-remote-rfc) target main as base, but each branch is stacked on the previous one, so every PR shows the same cumulative +1740/-39 diff across 33 files. I confirmed it: this 'cache diagnostics' PR (#2188) already contains the delete_range, delete_symbol (tree-sitter), and ssh-remote code (56 references). So #2188 through #2191 are literally the same giant diff — merging any one of them would merge all five features at once, and their titles don't match their contents.

Please restructure into independent, single-feature PRs (preferred for fork PRs):

  • pr/1 = only the cache-diagnostics change off main
  • pr/2 = only the MCP canonicalization change off main
  • pr/3 = only delete_range
  • pr/4 = only delete_symbol
  • pr/5 = only the SSH RFC

Each branched fresh from main with just that feature's commits (cherry-pick or interactive rebase to drop the others). Then each PR's diff is its own feature and I can review them on their merits — these are all things I want to look at properly (especially the two file-deleting tools and the SSH RFC, which need careful individual review).

If you'd rather keep them stacked, that requires pushing the intermediate branches to this upstream repo and pointing each PR's base at the previous branch — but for a fork, independent branches off main is simpler. Let me know once they're split and I'll dig into each.

@SivanCola SivanCola force-pushed the pr/1-cache-diagnostics-v1 branch from 6212f5e to 0256e1d Compare May 29, 2026 03:23
@SivanCola

Copy link
Copy Markdown
Collaborator Author

Updated the PR series as requested. The five fork branches are now independent branches off main, each with one feature commit instead of the previous shared cumulative stack.

Branch heads now are:

I rechecked the GitHub PR file lists after pushing: each PR now shows only its own feature scope rather than the old 33-file cumulative diff.

Verification run locally:

  • Focused Vitest suites for each independent PR branch passed.
  • npm run typecheck passed on each independent PR branch.
  • npm run lint passed on the branches that were committed before hooks were installed; the two later amended branches also passed the pre-commit lint hook.
  • The pre-push hook ran npm run verify successfully during the multi-ref push (build, lint, typecheck, and full test suite: 308 files / 3960 passed / 9 skipped).

Root cause was branch topology: the branch names were separate, but all five remote heads pointed at the same stacked tip. I rebuilt and force-updated the fork heads with --force-with-lease so the existing PRs can now be reviewed independently.

@SivanCola

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0256e1df82

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/loop.ts Outdated

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valuable feature — cache observability is squarely on-mission for a cache-first agent, and the reporting side is clean: doctor --cache / runCacheDoctorChecks (dynamic-prompt, MCP order, skills/memory, hooks, evidence) and /cache-miss-report read saved-session cacheDiagnostics evidence — all read-only, no behavior change. The session-meta cacheDiagnostics? field is backward-compatible (optional, absent on old sessions). Good.

One thing to confirm before I merge, since it's on the hot path: the loop now calls this.prefix.diagnosticHashes() before every API call. The prefix is immutable (only changes on addTool/removeTool/replaceSystem), so please confirm prefixDiagnosticHashes is memoized — computed once and invalidated on those mutations, like the existing _fingerprintCache — rather than re-hashing the full system+tools+few-shots prefix every turn. For a large prefix (30-50 MCP tools), re-hashing per turn would be exactly the kind of per-turn cost this product otherwise works hard to avoid. If it already caches off the same invalidation signal as the fingerprint, this is good to merge; if not, that's the one change I'd want. Could you point me at the diagnosticHashes implementation / confirm the caching?

@SivanCola

Copy link
Copy Markdown
Collaborator Author

Addressed the diagnostic hash memoization concern from review 4386489854 in e1e93e3.

Implementation:

  • ImmutablePrefix now memoizes PrefixDiagnosticHashes in a WeakMap keyed by immutable/frozen tool snapshots.
  • diagnosticHashes() returns cached evidence for the frozen prefix.tools() snapshot used on the hot path, while mutable external snapshots are computed without caching.
  • replaceSystem, addTool, and removeTool all clear the diagnostic hash cache via the same prefix-cache invalidation path as _fingerprintCache.
  • CacheFirstLoop still passes the actual turn-start toolSpecs snapshot, so the earlier MCP hot-add accuracy fix is preserved.

Regression coverage:

  • repeated diagnosticHashes(prefix.tools()) returns the memoized object;
  • replaceSystem, addTool, and removeTool invalidate and recompute;
  • the existing mid-turn hot-add loop test still verifies diagnostics record the sent snapshot.

Verification:

  • npm test -- tests/memory.test.ts tests/loop.test.ts tests/cache-diagnostics.test.ts
  • npm run lint
  • npm run typecheck
  • pre-push npm run verify passed: build, lint, typecheck, and full test suite (308 test files, 3955 passed, 9 skipped).

…ics-v1

# Conflicts:
#	tests/ui-slash-suggestions.test.tsx
@SivanCola

Copy link
Copy Markdown
Collaborator Author

Resolved the branch conflicts by merging current upstream/main into pr/1-cache-diagnostics-v1 in merge commit 43a113a1.

Conflict handled:

Verification:

  • npm test -- tests/ui-slash-suggestions.test.tsx tests/slash.test.ts tests/cache-diagnostics.test.ts tests/memory.test.ts tests/loop.test.ts
  • npm run lint
  • npm run typecheck
  • pre-push npm run verify passed: build, lint, typecheck, and full test suite (311 test files, 4007 passed, 9 skipped).

GitHub now reports mergeStateStatus: UNSTABLE instead of DIRTY, so the merge conflict is resolved; remaining mergeability, if any, should be CI/status related rather than file conflicts.

@SivanCola

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed — the hot-path concern is resolved. diagnosticHashes is now memoized via _diagnosticHashesCache = new WeakMap<readonly ToolSpec[], PrefixDiagnosticHashes>(), keyed by the immutable tool snapshot and invalidated alongside the other prefix caches. Since this.tools() returns the stable frozen snapshot (#2162), the WeakMap hits the same key until a tool mutation, so the prefix is hashed once per prefix-state rather than every turn — no per-turn re-hash of the full system+tools. Combined with the read-only reporting (doctor --cache / cache-miss-report) and the backward-compatible session-meta field, this is good. CLEAN + CI green. Merging.

@esengine esengine merged commit e472735 into esengine:main May 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants