Cache Efficiency Guardrails and Diagnostics / 缓存效率守卫与诊断#2314
Merged
esengine merged 2 commits intoMay 30, 2026
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 505a49801d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
esengine
added a commit
that referenced
this pull request
May 30, 2026
…ocale-independent (#2320) Follow-up polish to the cache-efficiency guardrails (#2314): - /status "cache detail" line was hard-coded English; route it through i18n (statusCacheDetail / statusCacheChurn) so it matches every other status row. EN + zh-CN translated; ja/de/ru inherit EN like the rest of their observability block. - sortToolSpecs used localeCompare, which is locale-sensitive and could let the host locale reshuffle the serialized tool prefix and reintroduce the very cache churn the sort is meant to prevent. Switch to a stable codepoint compare. No change for ASCII tool names (all existing tests pass). Co-authored-by: yhh <yhh@yhhdeMac-mini.local> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds cache-efficiency guardrails for DeepSeek-style high cache-hit sessions. It focuses on keeping the immutable prefix byte-stable, making cache churn observable, and avoiding compaction decisions that reduce total cache efficiency.
本次改动围绕高缓存命中率做了 6 类增强:稳定 prefix 形状、诊断 cache churn、控制 tool schema 成本、改进 fold 经济性、精简 reasoning 历史、补齐 probe/test/replay/UI 兼容。
Major Changes: Principle and Effect
1. Prefix-shape diagnostics and cache churn attribution
Principle: provider-side prompt cache is sensitive to the byte shape of the reusable request prefix. Even when semantic content is unchanged, system prompt, tool schema ordering, few-shot payloads, or transcript rewrites can turn a warm prefix into a cold one. The new diagnostics hash these prefix components and compare snapshots across turns.
Effect: runtime stats can now explain why a cache miss happened instead of only reporting that it happened.
/status, telemetry stats, replay summaries, and cache-related UI surfaces can show miss tokens, schema tokens, prefix changes, and top tool schema contributors. This also complements the existing/cache-miss-reportpath from upstream by adding local shape-level attribution.2. Stable tool schema ordering and schema cost governance
Principle: the complete tool list is part of the model request prefix. If MCP reconnects or dynamic registration produce the same logical tool set in a different order, the serialized request changes and cache reuse can be lost. Sorting tool specs by function name makes the schema prefix deterministic. Estimating each schema's token cost makes large tool definitions visible.
Effect: avoidable cold turns caused by MCP reconnect/order churn are reduced. The UI can surface expensive tool schemas in the context breakdown, so cache and token issues caused by large schemas are easier to diagnose. Tests now lock the reconnect/prefix invariants around this behavior.
3. Fold economics for compaction decisions
Principle: summarization/folding is not free. A fold creates a new summary segment that is cold at first and adds immediate request cost. Normal-band folding should only happen when the expected multi-turn savings exceed the summary and post-fold cold tax. Aggressive folding still protects the context window when headroom is genuinely low.
Effect: the context manager avoids cost-negative folds that would lower cache efficiency in medium-length sessions, while still preserving safety near context limits. New tests cover both the conservative normal-band behavior and cache-aligned fold invariants.
4. Reasoning retention and healing for tool-call history
Principle: thinking/reasoning models need reasoning fields to round-trip correctly for assistant messages that contain tool calls. However, stale plain assistant reasoning can bloat future request bodies and make prefix shape less stable. The healing path now strips stale plain reasoning while re-stamping only the tool-call assistant turns that require reasoning continuity.
Effect: tool-call transcripts remain API-safe for reasoning models, while unnecessary reasoning payload is removed from future requests. This reduces request bloat and lowers the chance of cache churn from stale assistant-only reasoning content.
5. Probe and regression guardrails
Principle: cache behavior should be testable without relying only on provider internals. Deterministic shape tests verify local invariants, and live/offline probes measure whether those invariants translate into high cache-hit behavior over realistic loops.
Effect: this PR adds
scripts/probe-cache-shape.mts, updates loop and long-session probes, and adds tests for cache shape and fold economics. I also ran the testing tool from PR #2306 against this branch via a temporary overlay, so the change is validated by the new offline cache guard scenarios as well.6. Documentation, replay, and UI compatibility
Principle: adding cache summary fields is only useful if old transcripts and replay paths remain readable. UI and replay defaults must tolerate sessions that were recorded before these fields existed.
Effect: App, ReplayApp, transcript replay, localized labels, and real-world cache benchmark docs were updated together. Existing sessions stay backwards compatible, and benchmark documentation now explicitly calls out expected cold summary segments after compaction.
Verification
git diff --cached --check: passednpm run typecheck: passednpm run lint: passed with one existing non-fatal warning insrc/cli/ui/PlanPanel.tsxabout a type-only React importnpx vitest run tests/cache-shape.test.ts tests/context-manager-fold-economics.test.ts tests/ctx-breakdown.test.ts tests/mcp-reconnect-prefix-invariant.test.ts tests/telemetry.test.ts tests/loop-r1-reasoning.test.ts tests/context-manager-cache-aligned-fold.test.ts: passed, 84 testsnpx tsx scripts/probe-cache-shape.mts: passednpm run build:dashboard: passednpx vitest run tests/loop.test.ts tests/dashboard-smoke.test.ts: passed, 85 passed and 1 skippednpm test -- --run: passed, 315 files passed and 1 skipped, 4047 tests passed and 12 skippednpm run build && npm run lint && npm run typecheck && npm run test --silent: passed, 316 files passed, 4050 tests passed and 9 skippedPR #2306 cache guard run
I fetched the testing tool from #2306 and ran it against a temporary overlay of this final branch.
npm run cache:guard: passedplain-dialogue: 98.4%, PASStool-roundtrip: 93.5%, PASSmulti-tool: 87.3%, PASSreasoning-retention: 98.5%, PASSlong-session-resume: 98.3%, PASSmcp-hot-add: 97.5%, PASS, breaks=1pro-one-shot: req=4, min-hit 98.3%, max-miss 124, breaks=2, PASSnpm run test -- tests/cache-guard.test.ts: passed, 3 testsRisk Notes
.gitignorechange forAGENTS.md, which was already present in the working tree and is unrelated to cache efficiency.