Skip to content

Offline Cache Regression Guard for High Cache-Hit Stability / 高缓存命中稳定性的离线回归守卫#2306

Closed
SivanCola wants to merge 1 commit into
esengine:v1from
SivanCola:codex/cache-guard-tool
Closed

Offline Cache Regression Guard for High Cache-Hit Stability / 高缓存命中稳定性的离线回归守卫#2306
SivanCola wants to merge 1 commit into
esengine:v1from
SivanCola:codex/cache-guard-tool

Conversation

@SivanCola

Copy link
Copy Markdown
Collaborator

Summary

Add a dedicated offline cache regression guard for Reasonix's DeepSeek prompt-cache path.

This PR introduces npm run cache:guard, backed by a synthetic DeepSeek driver that runs real CacheFirstLoop flows without making network calls. The guard captures each outbound chat request, renders a deterministic cache surface, estimates adjacent-request cache-hit ratios, and fails when a transition that should stay warm unexpectedly changes the immutable prefix or drops below the configured hit-ratio threshold.

Why this is necessary

Prompt-cache regressions are expensive but easy to miss:

  • They usually do not break functional correctness, so normal unit tests can stay green while every tool-call iteration becomes a cache miss.
  • A tiny request-shape change, such as losing an empty reasoning_content: "" field on historical thinking-mode assistant messages, can invalidate DeepSeek's prefix cache even when the rendered conversation looks semantically identical.
  • These regressions are most visible only through production billing or provider usage fields, which is too late for PR review.
  • Reasonix has several legitimate cache-break paths, such as MCP tool hot-add and Flash -> Pro one-shot escalation. A guard needs to distinguish expected one-time breaks from accidental persistent cache churn.

The new tool gives CI a cheap, deterministic check for this class of cost regressions before code is merged.

What it covers

The built-in scenarios exercise the main cache-sensitive dialogue shapes:

  • plain multi-turn dialogue
  • single tool-call round trip
  • multiple tool calls in one assistant step
  • thinking-mode reasoning retention / pruning behavior
  • long-session resume with more than 200 retained messages
  • MCP-like tool hot-add, allowing exactly one expected cache break and requiring the next turn to warm again
  • Flash -> Pro one-shot escalation, allowing the model-switch breaks and requiring the restored Flash path to warm again

The cache surface intentionally includes model, tool schemas, system messages, normalized conversation messages, and field presence for reasoning_content, so the earlier high-cost failure mode is covered directly.

Usage

npm run cache:guard

Optional flags:

npm run cache:guard -- --threshold=0.92
npm run cache:guard -- --json
npm run cache:guard -- --keep-temp

The default threshold is 85%, chosen so naturally larger multi-tool continuations remain valid while structural prefix regressions still fail loudly.

CI integration

Adds a Cache guard step to the main CI workflow after lint/typecheck and before build/test coverage. This makes the guard run on both PRs and pushes to main.

Verification

  • npm run cache:guard passes; lowest built-in scenario hit ratio: 87.1%.
  • npm run test -- tests/cache-guard.test.ts passes: 3 tests.
  • npm run typecheck passes.
  • npm run lint exits 0; it still reports the existing src/cli/ui/PlanPanel.tsx import-type warning unrelated to this PR.
  • Pre-push npm run verify passes: build, lint, typecheck, and full test suite (315 test files, 4044 tests passed, 9 skipped).

@Bernardxu123

Copy link
Copy Markdown
Collaborator

🙏 真诚感谢您的贡献!

感谢您为 DeepSeek-Reasonix v1 分支提交的 PR。这些改进体现了您对项目的认真态度和专业技术能力。

关于 v1 分支的说明:

v1 (0.x, TypeScript) 版本目前已 完全停止维护。项目已全面迁移至 v2 (Go rewrite, main-v2 分支),这是一个从零开始的全新架构重写。

由于 v1 已不再接受代码变更,此 PR 将被关闭。

我们非常欢迎您将这些修复贡献到 v2 分支! 如果您有兴趣,可以:

  1. 查看 v2 Issues 中是否有相关问题
  2. 基于 main-v2 分支创建新的 PR
  3. 参考 CONTRIBUTING.md 了解贡献指南

再次感谢您的时间和精力!您的贡献对开源社区非常有价值。❤️


🙏 Sincere thanks for your contribution!

Thank you for submitting this PR to the DeepSeek-Reasonix v1 branch. Your work demonstrates professional expertise and dedication.

About v1: v1 (0.x, TypeScript) is now completely unmaintained. The project has fully migrated to v2 (Go rewrite, main-v2 branch).

Since v1 no longer accepts code changes, this PR will be closed.

We warmly welcome your contributions to v2! Check v2 Issues or create a new PR on main-v2.

Thank you again! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v1 Legacy TypeScript line (0.x) — v1 branch, maintenance only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants