Cache Efficiency Guardrails and Diagnostics / 缓存效率守卫与诊断 by SivanCola · Pull Request #2314 · esengine/DeepSeek-Reasonix

SivanCola · 2026-05-29T15:59:34Z

Summary

This PR adds cache-efficiency guardrails for DeepSeek-style high cache-hit sessions. It focuses on keeping the immutable prefix byte-stable, making cache churn observable, and avoiding compaction decisions that reduce total cache efficiency.

本次改动围绕高缓存命中率做了 6 类增强：稳定 prefix 形状、诊断 cache churn、控制 tool schema 成本、改进 fold 经济性、精简 reasoning 历史、补齐 probe/test/replay/UI 兼容。

Major Changes: Principle and Effect

1. Prefix-shape diagnostics and cache churn attribution

Principle: provider-side prompt cache is sensitive to the byte shape of the reusable request prefix. Even when semantic content is unchanged, system prompt, tool schema ordering, few-shot payloads, or transcript rewrites can turn a warm prefix into a cold one. The new diagnostics hash these prefix components and compare snapshots across turns.

Effect: runtime stats can now explain why a cache miss happened instead of only reporting that it happened. /status, telemetry stats, replay summaries, and cache-related UI surfaces can show miss tokens, schema tokens, prefix changes, and top tool schema contributors. This also complements the existing /cache-miss-report path from upstream by adding local shape-level attribution.

2. Stable tool schema ordering and schema cost governance

Principle: the complete tool list is part of the model request prefix. If MCP reconnects or dynamic registration produce the same logical tool set in a different order, the serialized request changes and cache reuse can be lost. Sorting tool specs by function name makes the schema prefix deterministic. Estimating each schema's token cost makes large tool definitions visible.

Effect: avoidable cold turns caused by MCP reconnect/order churn are reduced. The UI can surface expensive tool schemas in the context breakdown, so cache and token issues caused by large schemas are easier to diagnose. Tests now lock the reconnect/prefix invariants around this behavior.

3. Fold economics for compaction decisions

Principle: summarization/folding is not free. A fold creates a new summary segment that is cold at first and adds immediate request cost. Normal-band folding should only happen when the expected multi-turn savings exceed the summary and post-fold cold tax. Aggressive folding still protects the context window when headroom is genuinely low.

Effect: the context manager avoids cost-negative folds that would lower cache efficiency in medium-length sessions, while still preserving safety near context limits. New tests cover both the conservative normal-band behavior and cache-aligned fold invariants.

4. Reasoning retention and healing for tool-call history

Principle: thinking/reasoning models need reasoning fields to round-trip correctly for assistant messages that contain tool calls. However, stale plain assistant reasoning can bloat future request bodies and make prefix shape less stable. The healing path now strips stale plain reasoning while re-stamping only the tool-call assistant turns that require reasoning continuity.

Effect: tool-call transcripts remain API-safe for reasoning models, while unnecessary reasoning payload is removed from future requests. This reduces request bloat and lowers the chance of cache churn from stale assistant-only reasoning content.

5. Probe and regression guardrails

Principle: cache behavior should be testable without relying only on provider internals. Deterministic shape tests verify local invariants, and live/offline probes measure whether those invariants translate into high cache-hit behavior over realistic loops.

Effect: this PR adds scripts/probe-cache-shape.mts, updates loop and long-session probes, and adds tests for cache shape and fold economics. I also ran the testing tool from PR #2306 against this branch via a temporary overlay, so the change is validated by the new offline cache guard scenarios as well.

6. Documentation, replay, and UI compatibility

Principle: adding cache summary fields is only useful if old transcripts and replay paths remain readable. UI and replay defaults must tolerate sessions that were recorded before these fields existed.

Effect: App, ReplayApp, transcript replay, localized labels, and real-world cache benchmark docs were updated together. Existing sessions stay backwards compatible, and benchmark documentation now explicitly calls out expected cold summary segments after compaction.

Verification

git diff --cached --check: passed
npm run typecheck: passed
npm run lint: passed with one existing non-fatal warning in src/cli/ui/PlanPanel.tsx about a type-only React import
npx vitest run tests/cache-shape.test.ts tests/context-manager-fold-economics.test.ts tests/ctx-breakdown.test.ts tests/mcp-reconnect-prefix-invariant.test.ts tests/telemetry.test.ts tests/loop-r1-reasoning.test.ts tests/context-manager-cache-aligned-fold.test.ts: passed, 84 tests
npx tsx scripts/probe-cache-shape.mts: passed
npm run build:dashboard: passed
npx vitest run tests/loop.test.ts tests/dashboard-smoke.test.ts: passed, 85 passed and 1 skipped
Full local suite before push: npm test -- --run: passed, 315 files passed and 1 skipped, 4047 tests passed and 12 skipped
Pre-push verify hook: npm run build && npm run lint && npm run typecheck && npm run test --silent: passed, 316 files passed, 4050 tests passed and 9 skipped

PR #2306 cache guard run

I fetched the testing tool from #2306 and ran it against a temporary overlay of this final branch.

npm run cache:guard: passed

plain-dialogue: 98.4%, PASS
tool-roundtrip: 93.5%, PASS
multi-tool: 87.3%, PASS
reasoning-retention: 98.5%, PASS
long-session-resume: 98.3%, PASS
mcp-hot-add: 97.5%, PASS, breaks=1
pro-one-shot: req=4, min-hit 98.3%, max-miss 124, breaks=2, PASS
Overall threshold: 85.0%, PASS

npm run test -- tests/cache-guard.test.ts: passed, 3 tests

Risk Notes

Tool specs are now canonicalized by function name. This intentionally changes the diagnostic/order view of hot-added tools; API payload order becomes stable by name instead of registration timing.
Fold economics may delay normal-band summaries compared with the previous behavior. Aggressive/headroom-based folding still protects the model context window.
This PR intentionally excludes the local .gitignore change for AGENTS.md, which was already present in the working tree and is unrelated to cache efficiency.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 505a49801d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ocale-independent (#2320) Follow-up polish to the cache-efficiency guardrails (#2314): - /status "cache detail" line was hard-coded English; route it through i18n (statusCacheDetail / statusCacheChurn) so it matches every other status row. EN + zh-CN translated; ja/de/ru inherit EN like the rest of their observability block. - sortToolSpecs used localeCompare, which is locale-sensitive and could let the host locale reshuffle the serialized tool prefix and reintroduce the very cache churn the sort is meant to prevent. Switch to a stable codepoint compare. No change for ASCII tool names (all existing tests pass). Co-authored-by: yhh <yhh@yhhdeMac-mini.local> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(cache): add cache efficiency guardrails

505a498

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Comment thread src/loop.ts Outdated

fix(cache): hash sent tool snapshot for diagnostics

2d9a8e3

esengine merged commit fccea10 into esengine:main May 30, 2026
4 checks passed

esengine mentioned this pull request May 30, 2026

chore(cache): localize /status cache-detail line + locale-independent tool sort #2320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Efficiency Guardrails and Diagnostics / 缓存效率守卫与诊断#2314

Cache Efficiency Guardrails and Diagnostics / 缓存效率守卫与诊断#2314
esengine merged 2 commits into
esengine:mainfrom
SivanCola:codex/cache-efficiency-guardrails

SivanCola commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SivanCola commented May 29, 2026

Summary

Major Changes: Principle and Effect

1. Prefix-shape diagnostics and cache churn attribution

2. Stable tool schema ordering and schema cost governance

3. Fold economics for compaction decisions

4. Reasoning retention and healing for tool-call history

5. Probe and regression guardrails

6. Documentation, replay, and UI compatibility

Verification

PR #2306 cache guard run

Risk Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants