Skip to content

fix(core): stabilize DeepSeek tool cache prefix#4518

Closed
Jerry2003826 wants to merge 6 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/deepseek-cache-prefix-stability
Closed

fix(core): stabilize DeepSeek tool cache prefix#4518
Jerry2003826 wants to merge 6 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/deepseek-cache-prefix-stability

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Stabilizes DeepSeek tool-cache prefixes by keeping tool declarations in a deterministic order across turns.

Why it's needed

DeepSeek prompt caching is prefix-sensitive; tool-order drift can destroy cache hits even when the effective tool set is unchanged.

Reviewer Test Plan

How to verify

Run the targeted DeepSeek cache/tool-order tests added in this PR, then run eslint/prettier/typecheck for the touched core files. GitHub CI also validates lint and tests on macOS, Windows, and Linux.

Evidence (Before & After)

Before: logically identical turns could produce different tool declaration ordering and miss prefix cache reuse. After: regression coverage verifies stable ordering. This is provider prompt/caching behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: Limited to deterministic ordering of tool declarations; tool availability and execution behavior are unchanged.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

References the DeepSeek cache-prefix stability issue discussed in this PR.

中文说明

这个 PR 做了什么

通过稳定工具声明顺序,改善 DeepSeek 前缀缓存稳定性。

为什么需要

DeepSeek 缓存对 prompt 前缀非常敏感;工具顺序漂移会让本应命中的缓存失效。

Reviewer Test Plan

本地按 PR 相关测试、eslint、prettier check、typecheck 验证;macOS/Linux 由 GitHub CI 验证。

风险和范围

只影响工具声明的确定性排序,不改变工具可用性或执行行为。

@Jerry2003826 Jerry2003826 marked this pull request as ready for review May 25, 2026 21:27
wenshao
wenshao previously approved these changes May 26, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, well-motivated refactor. The getCachedPromptTokens extraction is a good de-duplication, and the hostname-gated tool sort is correctly scoped.

Minor suggestions for follow-up (low confidence, not blocking):

  • Consider adding a test that verifies the getCachedPromptTokens precedence chain (e.g., prompt_tokens_details.cached_tokens wins over prompt_cache_hit_tokens when both are present).
  • Consider adding a streaming-path test for convertOpenAIChunkToGemini with DeepSeek cache fields.
  • prompt_cache_miss_tokens is declared in ExtendedCompletionUsage but never read — consider removing or documenting intent.

— qwen3.7-max via Qwen Code /review

@pomelo-nwu pomelo-nwu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉

As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.

Specifically, for each PR please include:

  • How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
  • Evidence (Before & After) — use the tmux-real-user-testing skill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge
  • Tested on — fill in the OS table (macOS / Windows / Linux)

PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.

You can see the full template at: .github/pull_request_template.md

Thanks again for your effort — looking forward to getting these merged! 🚀

中文说明

你好 @Jerry2003826,感谢你的持续贡献——短时间内提交了 9 个 PR,非常高效!🎉

在 review 过程中,我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan,它能显著加速审核和合并流程。

具体来说,请为每个 PR 补充:

  • How to verify — 清晰的复现步骤,让 reviewer 能确认修复/功能的效果
  • Evidence (Before & After) — 使用 tmux-real-user-testing skill(或手动 tmux 截取)展示修改前后的 TUI 截图对比,前后对比能让维护者更快地验证和合并
  • Tested on — 填写操作系统测试表格(macOS / Windows / Linux)

有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。

完整模版见:.github/pull_request_template.md

再次感谢你的付出,期待尽快把这些 PR 合并!🚀

— Qwen Code

@wenshao wenshao added the type/bug Something isn't working as expected label May 26, 2026
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated the PR description to the latest template, including Reviewer Test Plan, Evidence, Tested on, Risk & Scope, and the Chinese summary.

No code changes were needed for this update; the existing validation remains:

npm test --workspace=packages/core -- src/core/openaiContentGenerator/pipeline.test.ts
npm test --workspace=packages/core -- src/core/openaiContentGenerator/converter.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

本地验证报告 — PR #4518

作为 maintainer 在本地用 tmux 跑了完整真实验证。结论:通过,建议合并

验证环境

平台 macOS 15.4 (Darwin 25.4.0)
Node / npm v22.17.0 / 11.8.0
工作树 /private/tmp/pr4518-test
PR HEAD ee823dc6c (fix(core): stabilize DeepSeek tool cache prefix)
origin/main 的 merge-base e8b79d7721(即当前 main HEAD)
git merge-tree origin/main pr-4518-test 干净(exit 0、无冲突标记)
提交数 1
tmux session pr4518-test(4 windows: install / unit-test / manual-repro / misc)

1. 单元测试

Suite Tests
core/openaiContentGenerator/pipeline.test.ts 44 ✅
core/openaiContentGenerator/converter.test.ts 103 ✅
core/openaiContentGenerator/provider/deepseek.test.ts(间接) 25 ✅
core/openaiContentGenerator/provider/dashscope.test.ts(间接) 56 ✅
providers/__tests__/presets/deepseek.test.ts(间接) 3 ✅
core/anthropicContentGenerator/converter.test.ts(下游) 50 ✅
core/anthropicContentGenerator/usage.test.ts(下游) 7 ✅
core/loggingContentGenerator/loggingContentGenerator.test.ts(下游) 43 ✅
telemetry/loggers.test.ts(下游) 45 ✅
合计 376 / 376 ✅

间接 + 下游测试是通过 grep -r --include="*.test.ts" -E "isDeepSeekHostname|isDeepSeekProvider|api\.deepseek\.com|prompt_cache_hit_tokens|prompt_cache_miss_tokens|getCachedPromptTokens|sortToolsForCacheStableRequest|cachedContentTokenCount" 搜出所有相关 test 文件,确保 cachedContentTokenCount 的下游消费方(anthropic converter、logging、telemetry)不会因为新增的 fallback 路径而漂移。

2. 静态检查

TSC=0      # npx tsc --noEmit -p packages/core/tsconfig.json
ESLINT=0   # 4 个改动文件,--max-warnings 0

3. 真实端到端复现(用 tsx 直接 import 源码)

独立 repro 脚本tsx 直接 import 真实源码(provider/deepseek.tsconverter.ts),把 PR 的两半都端到端跑透:

Case 1: isDeepSeekHostname() — strict hostname gate
  PASS  https://api.deepseek.com/v1 → true (official DeepSeek root)
  PASS  https://api.deepseek.com → true (official without /v1)
  PASS  https://sub.api.deepseek.com/v1 → true (subdomain of api.deepseek.com)
  PASS  https://api.deepseek.com.evil.com/v1 → false (hostile lookalike, suffix attack)
  PASS  https://api-deepseek.com/v1 → false (dashed lookalike)
  PASS  https://api.example.com/v1 → false (unrelated host)
  PASS  https://localhost:8080/v1 → false (self-hosted)
  PASS  (empty) → false (empty baseUrl)
  PASS  not-a-url → false (malformed URL)
  PASS  https://API.DEEPSEEK.COM/v1 → true (uppercase normalised)
  ─ gating 是基于 `new URL()` 解析后的精确 hostname,不是子串匹配 ——
    `api.deepseek.com.evil.com` 这种攻击面正确拒绝。

Case 2: sortToolsForCacheStableRequest sorts tools by function name
  PASS  result order: alpha,bravo,zeta

Case 3: caller order does not affect post-sort output
  PASS  all permutations sort to: alpha,bravo,zeta
  ─ 三种不同初始顺序(A/B/C 三种排列)都收敛到同一最终序列。
    这正是 DeepSeek prefix-exact cache 要求的:tool 注册顺序漂移不再污染 prompt 前缀。

Case 4: sort is a no-op for < 2 tools or undefined
  PASS  undefined tools / 空 tools / 单个 tool 都不会改动
Case 5: sort uses a fresh array — caller order intact
  PASS  caller 持有的 [...tools] 顺序不被修改(用了 [...request.tools].sort(...))
  PASS  request.tools 是新分配的数组
Case 6: same function name disambiguated by serialised tool
  PASS  同名 tool 用 JSON.stringify(tool) 做次级 tie-break,仍是确定序

Case 7: convertOpenAIResponseToGemini maps prompt_cache_hit_tokens
  PASS  promptTokenCount=1000
  PASS  cachedContentTokenCount=800 (PR 之前是 0 —— 这就是修复点)
  PASS  candidatesTokenCount=50

Case 8: precedence — prompt_tokens_details.cached_tokens > cached_tokens > prompt_cache_hit_tokens
  PASS  all three present → OpenAI 标准字段优先 (got 11)
  PASS  OpenAI 缺失 → top-level cached_tokens 次优先 (got 22)
  PASS  仅 DeepSeek 字段 → prompt_cache_hit_tokens 兜底 (got 33)
  PASS  全无 → 0 (无回归, got 0)
  ─ 新增的 fallback 优先级最低,确保非 DeepSeek(包括官方 OpenAI、Qwen、DashScope)
    走 cached_tokens 的现有行为完全不变。

Case 9: convertOpenAIChunkToGemini also maps prompt_cache_hit_tokens
  PASS  streaming cachedContentTokenCount=400 (与非 streaming 路径同步)
  ─ 两条路径都改了,DeepSeek 流式响应里的 cache hit 也能被采集。

PR 4518 repro: 26 pass, 0 fail

4. 代码层面的观察

正向

  • hostname gating 写得很扎实:复用了仓库里已有的 isDeepSeekHostname(不是新写一个),且该函数本身就用 new URL(...).hostname 精确匹配 api.deepseek.com*.api.deepseek.com。Case 1 验证了 10 种 baseUrl 输入,包括 api.deepseek.com.evil.com 这种 suffix 攻击面 —— 与 PR 第 2 行 "the broader is*Provider check would false-positive on hostile hosts" 的设计意图一致。
  • gating 收得很窄:只在官方 hostname 触发 sort,扩展到 isDeepSeekProvider(包含 model-name 启发式)。理由很合理:sort 是「为 DeepSeek 的 prefix-exact KV cache 优化」的副作用,其他 OpenAI-compat 后端不应该被改变行为。PR 描述明确说了这是设计意图。
  • sort 是 immutable 风格[...request.tools].sort(...) 不修改 caller 数组(Case 5 验证)。
  • sort key 用 JSON.stringify 做次级 tie-break:同名 tool(schema 不同)也有稳定排序,避免 Array.prototype.sort 的非稳定实现造成漂移(V8 现在是 stable,但代码不依赖这个)。
  • cache 字段优先级安排合理prompt_tokens_details.cached_tokens(OpenAI 标准)→ 顶层 cached_tokensprompt_cache_hit_tokens(DeepSeek 新加)。新字段是 fallback 最末位,所以其他 provider 行为零变化(Case 8 用 4 种组合验证)。
  • streaming + 非 streaming 都改了getCachedPromptTokens()convertOpenAIResponseToGeminiconvertOpenAIChunkToGemini 两处都用上,DeepSeek 流式响应也能采集到 cache hit(Case 9 验证)。
  • 重构干净:把重复的 usage.prompt_tokens_details?.cached_tokens ?? extendedUsage.cached_tokens ?? 0 提成 getCachedPromptTokens() 单一来源,未来再加 provider 字段只改一处。
  • 行为契约清晰:JSDoc 注释明确写「sort only for official DeepSeek endpoints to avoid surprising other OpenAI-compatible providers」—— 未来回看代码的人能立刻明白为什么有 hostname 守卫。

潜在风险(不阻塞,仅记录)

  • JSON.stringify(tool) 在大 tool 集 / 大 schema 下有 O(n) 成本:Case 6 的 tie-break 走的是 JSON.stringify(tool),每个 tool 都序列化一次,外层再 Array.prototype.sort 大约 O(n·log n) 次比较。常见场景 tool 数 < 50,schema 也不大(< 几 KB),所以单次请求总成本应该在 1ms 量级以下,可接受。若有团队跑大量 tool(> 200)且 schema 巨大,可以单独优化。
  • 官方 hostname 是硬编码api.deepseek.com。如果 DeepSeek 未来变动品牌或加新区域 hostname(比如 api.deepseek-cn.com),需要同步更新;但与其他 hostname-gate 的代码一致,是 codebase 一贯写法。
  • PR 描述里 "Windows local npm workspace test environment, ?? not tested macOS" 表格中的 ✅ 是 Windows。本次本地验证补齐了 macOS。

5. 没在本地覆盖

  • 真实击中 DeepSeek 官方 API 跑一次(需要 DEEPSEEK_API_KEY 和真实账户付费)—— wire 层已用 Case 1–6 全覆盖。
  • Windows / Linux 真机(仅 macOS)—— CI 三平台都 ✅。

结论

维度 状态
全部单元测试(9 文件 / 376 用例)
TSC 类型检查
ESLint(4 改动文件,零警告)
真实路径端到端 repro(26 项契约断言)
git merge-tree 与最新 main ✅ 无冲突
CI 远端(Lint / 3 平台 Test / CodeQL)
GitHub mergeable 标志 ✅ MERGEABLE
hostname 攻击面(suffix / dashed lookalike) ✅ 全部拒绝
非 DeepSeek provider 行为 ✅ 零变化(gated + cache fallback 优先级最末位)
Breaking change ✅ 无

建议合并。 没有阻塞问题。

— wenshao

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Live DeepSeek API 验证 — PR #4518 补充

之前的 验证报告 跑的是单元测试 + tsx 端到端,但没击中真实的 DeepSeek API。这次补充用 ~/.qwen/settings.jsondeepseek-v4-pro 配置的真实 AK,直接打 https://api.deepseek.com/v1 验证 PR 的核心行为契约:「同 tool 集,乱序送 → 缓存击穿」。

1. 验证方式

写了一个独立脚本 scripts/deepseek-live-repro.mts,用真实 openai SDK + 真实 endpoint,把 PR 描述里 "reversing only tool order dropped cache hits from about 99.5% to 0%" 这条断言端到端复现一遍:

  1. warm:30 个 verbose tool(schema 较大)按 canonical 顺序送出 → 冷启动
  2. repeat-canonical:完全相同的请求再送一遍 → 应该命中缓存
  3. shuffled-unsorted:同样的 tool 集,但用 Fisher-Yates 洗牌后调用 PR 的 sort → 模拟 PR 之前的行为
  4. shuffled-then-sorted:同样的洗牌后的 tool 集,调用 PR 的 sortToolsForCacheStableRequest再送 → 模拟 PR 之后的行为
  5. converter 映射验证:把最后一次响应丢进 convertOpenAIResponseToGemini,检查 cachedContentTokenCount 是否等于 DeepSeek 返回的 prompt_cache_hit_tokens

所有请求 model=deepseek-v4-pro, tool_choice='auto', temperature=0, max_tokens=8,每次间隔 1.5s 避免 rate-limit。

重要踩坑:第一次跑用了 tool_choice: 'none',结果发现 DeepSeek 在 tool_choice='none'根本不把 tools 计入 prompt_tokens(实测 prompt_tokens 只有 12 而非 17000+),自然也不进入缓存键。改成 'auto' 之后才真正复现 PR 的现象。其他同学跑类似验证时记得避坑。

2. 真实 API 跑出的结果

Step 0: warm up cache (canonical order, first call — expect mostly cold)
    [warm] prompt=8332, cache_hit=0, total=8340

Step 1: same canonical order — expect HIGH cache_hit (proves DeepSeek cache works)
    [repeat-canonical] prompt=8332, cache_hit=8320, total=8340
  PASS  cache_hit went from 0 (cold) to 8320 (warm). Δ=8320 tokens
  PASS  repeat canonical → cache_hit=8320 > 0

Step 2: SHUFFLED order, NOT sorted (simulates the pre-PR bug)
    [shuffled-unsorted] prompt=8332, cache_hit=0, total=8340
  PASS  cache_hit dropped from 8320 (sorted) to 0 (shuffled-unsorted). Δ=8320 tokens — this is the bug.

Step 3: SHUFFLED but sorted via PR helper (proves the fix)
    [shuffled-then-sorted] prompt=8332, cache_hit=8320, total=8340
  PASS  sort restores cache: cache_hit=8320 (vs warm canonical baseline 8320)
  PASS  sorted shuffle (8320) > unsorted shuffle (0) — sort closes the gap

Step 4: verify converter maps DeepSeek prompt_cache_hit_tokens into cachedContentTokenCount
  PASS  cachedContentTokenCount=8320 (matches DeepSeek's prompt_cache_hit_tokens=8320)

Live DeepSeek E2E: 6 pass, 0 fail

3. 命中率对照

场景 cache_hit 命中率 含义
cold(冷启动) 0 0.0% 基线
canonical repeat 8320 99.86% 缓存正常工作的上限
shuffled, NOT sorted 0 0.00% PR 之前的 bug —— 缓存被完全击穿
shuffled, sorted by PR helper 8320 99.86% PR 之后 —— 缓存命中完全恢复
Step 4 cache mapping cachedContentTokenCount = 8320 converter 正确映射

PR 描述里写的 "from about 99.5% to 0%" 在 deepseek-v4-pro 上 1:1 复现(实测 99.86% → 0% → 99.86%)。

4. 进一步探查:DeepSeek 缓存的真实行为

为了排除「是不是只是 prompt 前缀某段在缓存」,又写了 scripts/deepseek-cache-deep-probe.mts 做对照实验。用 50 个 bloated schema tool(prefix ~17000 token),跑 cold → warm → reverse → shuffle → 回到 canonical:

cold            prompt=17133, cached=0
warm            prompt=17133, cached=17024  ← 17024/17133 = 99.4% 命中
reversed tools  prompt=17133, cached=0      ← 反转 = 完全失效
shuffled tools  prompt=17133, cached=0      ← 随机打乱 = 完全失效
back to canonical prompt=17133, cached=17024 ← 恢复原序 = 缓存复活

结论:DeepSeek 的 prompt 缓存对 tools 数组顺序非常敏感,哪怕只是反转或洗牌(tool 集 / schema 完全相同),缓存就 100% 击穿。回到原始顺序,缓存立即恢复。这正是 PR 的 sort 修复要解决的问题。

5. 价值估算

按这次实测:每次「tool 顺序漂移」会让一次请求多消耗 ~8320 prompt tokens(按 deepseek-v4-pro 当前定价 cache miss 约 $0.14/M tokens vs cache hit 约 $0.014/M tokens 算,单次约多花 0.1 美分 + 数百毫秒 prefill 延迟)。对一个跑长 session、多次重新组装 tool 列表的 agent workflow(比如 MCP 动态加载工具),日积月累不容忽视。

更重要的是延迟:cache hit 走 KV cache 是几乎瞬时的 prefill,cache miss 要把 8000+ token 完整重算 —— 对响应延迟敏感的交互场景影响很大。

6. 修订上一份报告的结论

之前的报告里我写「单测 + tsx 端到端 26/26 pass」,那确实验证了代码逻辑没问题,但没击中 PR 想堵的真实缓存行为。这次真实 API 验证补齐了这一块:

维度 结论
PR 修复的 bug 是否真实存在 ✅ 完全复现(99.86% → 0%)
PR 的 sort 是否真的解决问题 ✅ 完全恢复(0% → 99.86%)
prompt_cache_hit_tokens 映射 ✅ converter 正确产出 cachedContentTokenCount = 8320
hostname 攻击面 ✅ 之前报告已验证
非 DeepSeek provider 行为 ✅ 之前报告已验证不受影响

结论:强烈建议合并。 这不只是一个理论 fix,而是一个有真实缓存命中率影响的 bug —— 对所有用 api.deepseek.com 的用户都会显著降低 prompt token 成本和首字延迟。

Repro 复现路径

cd /private/tmp/pr4518-test
export DEEPSEEK_API_KEY=<your-key>   # 或者读 ~/.qwen/settings.json env.DEEPSEEK_API_KEY_GMAIL
npx tsx scripts/deepseek-live-repro.mts

— wenshao

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Cleaned up the PR description to match the latest template and incorporated the live DeepSeek API validation results from the maintainer report. Code is unchanged; current CI is green across macOS, Windows, and Linux.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected

This PR shares modified code with PR #4505 (fix(core): emit enable_thinking on DashScope when reasoning is disabled).

Shared functions:

  • buildRequestpackages/core/src/core/openaiContentGenerator/pipeline.ts

Both PRs also modify files in the openaiContentGenerator directory:

  • pipeline.ts — shared buildRequest function
  • converter.ts — this PR modifies convertOpenAIResponseToGemini

Recommendation: These PRs both modify the OpenAI content generation pipeline. Coordinate with PR #4505 author before merging.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected

This PR shares modified code with PR #4505 (fix(core): emit enable_thinking on DashScope when reasoning is disabled).

Shared functions:

  • buildRequestpackages/core/src/core/openaiContentGenerator/pipeline.ts

Both PRs also modify files in the openaiContentGenerator directory:

  • pipeline.ts — shared buildRequest function
  • converter.ts — this PR modifies convertOpenAIResponseToGemini

Recommendation: These PRs both modify the OpenAI content generation pipeline. Coordinate with PR #4505 author before merging.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

interface ExtendedCompletionUsage extends OpenAI.CompletionUsage {
cached_tokens?: number;
prompt_cache_hit_tokens?: number;
prompt_cache_miss_tokens?: number;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] prompt_cache_miss_tokens is declared on ExtendedCompletionUsage but never read anywhere. getCachedPromptTokens only consumes prompt_cache_hit_tokens. The test fixture sets prompt_cache_miss_tokens: 105 but asserts nothing about it.

Dead interface surface — signals to future readers that this field is consumed when it is not. If a consumer is needed later (e.g., for cache-hit-rate observability), it should be added alongside the reading code.

Suggested change
prompt_cache_miss_tokens?: number;

(Remove this line and the corresponding prompt_cache_miss_tokens: 105 from the test fixture in converter.test.ts.)

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cf0a82c. Removed the unused prompt_cache_miss_tokens extension field and the fixture value, since this PR only consumes cache-hit counts.

expect(response.candidates).toEqual([]);
});

it('maps DeepSeek prompt cache hit tokens into cached content token count', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test covers the non-streaming path (convertOpenAIResponseToGemini), but the streaming path (convertOpenAIChunkToGemini) also calls getCachedPromptTokens and has no corresponding test for prompt_cache_hit_tokens.

DeepSeek's streaming API may include prompt_cache_hit_tokens in the final chunk's usage object. The streaming usage assembly has its own estimation fallback logic (totalTokens > 0 && promptTokens === 0 && completionTokens === 0) that interacts with cachedTokens — this interaction is untested.

Consider adding a mirror test that feeds a chunk with prompt_cache_hit_tokens through convertOpenAIChunkToGemini and asserts cachedContentTokenCount is populated.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cf0a82c. The streaming convertOpenAIChunkToGemini path now has coverage for provider-specific prompt_cache_hit_tokens in chunk usage metadata.

function getCachedPromptTokens(usage: OpenAI.CompletionUsage): number {
const extendedUsage = usage as ExtendedCompletionUsage;
return (
usage.prompt_tokens_details?.cached_tokens ??

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test verifies the ?? precedence chain: prompt_tokens_details.cached_tokens > cached_tokens > prompt_cache_hit_tokens. This PR adds a third fallback branch, making the relative ordering meaningful.

No existing test sends multiple cache-token fields simultaneously. If a future refactor reorders the chain, the wrong value could silently take precedence without any test catching it.

Consider adding a test with usage: { prompt_tokens_details: { cached_tokens: 100 }, cached_tokens: 200, prompt_cache_hit_tokens: 300 } asserting cachedContentTokenCount === 100.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cf0a82c. The non-streaming cache test now covers the precedence chain with prompt_tokens_details.cached_tokens, top-level cached_tokens, and prompt_cache_hit_tokens, asserting the standard nested field wins.

// serialized prompt prefix even when the tool set and schemas are identical.
// Sort only for official DeepSeek endpoints to avoid surprising other
// OpenAI-compatible providers with changed tool presentation order.
if (isDeepSeekHostname(this.contentGeneratorConfig)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The tool sort is gated on isDeepSeekHostname() but KV-cache prefix-exactness is a property of the DeepSeek model architecture, not the endpoint hostname. Self-hosted DeepSeek deployments (sglang/vllm/ollama behind a proxy) share the same cache behavior but won't benefit from the sort.

The codebase already has isDeepSeekProvider() — which checks hostname or model name — for exactly this reason (see provider/deepseek.ts:58-64). The thinking: { type: 'disabled' } gate at line 405 is correctly hostname-scoped (wire-shape extension that self-hosted models may not accept), but the sort is a purely cache-optimization that is harmless for non-DeepSeek providers.

Consider using isDeepSeekProvider() here, or add a comment explaining why self-hosted DeepSeek is intentionally excluded from the cache optimization.

— qwen3.7-max via Qwen Code /review

@Jerry2003826 Jerry2003826 May 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5556284. Tool sorting now uses isDeepSeekProvider() so self-hosted DeepSeek models get the cache-stable ordering by model name, while the hostname-only thinking wire-shape gate is unchanged. Added a self-hosted DeepSeek regression test.

// serialized prompt prefix even when the tool set and schemas are identical.
// Sort only for official DeepSeek endpoints to avoid surprising other
// OpenAI-compatible providers with changed tool presentation order.
if (isDeepSeekProvider(this.contentGeneratorConfig)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Stale comment: the two lines above say "Sort only for official DeepSeek endpoints" but the gate here is isDeepSeekProvider, which also matches self-hosted DeepSeek deployments via model-name fallback (as confirmed by the sorts self-hosted DeepSeek tools by model name test). The comment was accurate when the gate was isDeepSeekHostname but was not updated when the gate was widened in this commit.

Suggested reword:

// DeepSeek's KV cache is prefix-exact: a different tool order changes the
// serialized prompt prefix even when the tool set and schemas are identical.
// Gate on isDeepSeekProvider so both official and self-hosted DeepSeek
// deployments benefit; non-DeepSeek providers keep caller order.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 349f7f2. Updated the comment to match the widened isDeepSeekProvider() gate: official and self-hosted DeepSeek deployments get cache-stable ordering, while non-DeepSeek providers keep caller order.

…-prefix-stability-fix

# Conflicts:
#	packages/core/src/core/openaiContentGenerator/pipeline.ts
).toEqual(['zeta', 'alpha']);
});

it('sorts self-hosted DeepSeek tools by model name before sending the wire request', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test title says "by model name" but the sort key is tool.function.name. The model name is the detection gate (via isDeepSeekProvider), not the sort criterion. The sibling test at line 427 is correctly titled "sorts DeepSeek tools by function name". Consider aligning: e.g., "sorts tools by function name for self-hosted DeepSeek (detected via model name)".

— qwen3.7-max via Qwen Code /review

(tool: OpenAI.Chat.ChatCompletionTool) => tool.function.name,
),
).toEqual(['alpha', 'bravo', 'zeta']);
expect(apiCall.thinking).toBeUndefined();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] expect(apiCall.thinking).toBeUndefined() asserts a hostname-gate invariant (self-hosted DeepSeek does not receive the thinking wire-shape parameter) that is unrelated to tool ordering. This is a valuable invariant but mixing it into a tool-sort test makes failures harder to diagnose and hides it from anyone looking for thinking-parameter tests. Consider extracting it into its own test (e.g., "does not inject thinking parameter for self-hosted DeepSeek detected by model name") alongside the existing enable_thinking tests.

— qwen3.7-max via Qwen Code /review

return 0;
}

function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] getToolSortKey builds a composite key [name, type, JSON.stringify(tool)] but tool.function.name alone is a sufficient total order — function names are unique within a tool set (the Gemini→OpenAI converter enforces this upstream). The JSON.stringify tiebreaker is invoked O(n log n) times per request and serializes the entire tool schema (often the largest part of the request body). Simplifying to return tool.function.name; is correct, cheaper, and clearer about the contract.

Suggested change
function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {
function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {
return tool.function.name;
}

— qwen3.7-max via Qwen Code /review


// DeepSeek's KV cache is prefix-exact: a different tool order changes the
// serialized prompt prefix even when the tool set and schemas are identical.
// Gate on isDeepSeekProvider so both official and self-hosted DeepSeek

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The comment explains what the gate does but not why the broad isDeepSeekProvider (hostname OR model name) is correct here while reasoning_effort in the same file uses the narrow isDeepSeekHostname. The rationale is that cache-prefix stability depends on the model's KV-cache behavior (not the server's wire-shape extensions), so a false positive here is harmless — reordering tools is a no-op for backends without prefix-exact caching. A future maintainer may mechanically "unify" the two gates to the narrow one and silently regress self-hosted cache hit rates. Consider adding one line: e.g., "A false positive is low-cost: reordering tools is a no-op for backends without prefix-exact caching, unlike reasoning_effort where a false positive pushes an unsupported body parameter."

— qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R6: All R5 suggestions addressed in f751874. Code is clean, well-tested, and correctly scoped. LGTM ✅ — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Maintainer test report

Tested PR head f7518745 on Linux (Debian 13, kernel 6.12, x86_64, Node v22.22.2, npm 10.9.7) — a platform not covered by the author's local environment. Driven from a tmux session in a dedicated git worktree.

Static checks

Check Command Result
Targeted unit tests npm run test --workspace=packages/core -- src/core/openaiContentGenerator/pipeline.test.ts src/core/openaiContentGenerator/converter.test.ts 150 / 150 passed (pipeline 45, converter 105)
Prettier npx prettier --check on the four touched files All formatted
Typecheck npm run typecheck --workspace=packages/core Clean
Build npm run build --workspace=packages/core Clean

Live DeepSeek API — premise: cache is prefix-exact wrt tool order

Direct calls to https://api.deepseek.com/chat/completions (deepseek-chat), same system + user prompt, identical set of 24 tool declarations, varying only in array order. prompt_tokens=3770 for all four runs.

Phase Tool order prompt_cache_hit_tokens
1. Warm canonical sorted 3712 (already warm from a prior call)
2. Repeat canonical sorted 3712
3. Shuffled, no sort deterministic shuffle (seed 1337) 896
4. Shuffled, then sort applied client-side using the PR's compare sorted 3712

Cache hits collapse from 3712 → 896 when tool order changes, and are fully restored to 3712 after applying the PR's sort. The 896-token residue in phase 3 is the system-prompt prefix that survives before the tool array diverges — confirms cache is prefix-exact and not whole-prompt-hashed, exactly as the PR description claims.

Live DeepSeek API — end-to-end through the actual built pipeline

Imported ContentGenerationPipeline from packages/core/dist at PR HEAD, instrumented chat.completions.create to capture the wire body, and ran two requests with the 10 tool declarations registered in shuffled order against https://api.deepseek.com/v1.

[1-shuffled-input-first]  sortedOnWire=true  promptTokens=1276  cachedContentTokenCount=0
[2-shuffled-input-repeat] sortedOnWire=true  promptTokens=1276  cachedContentTokenCount=1152

Wire body in both calls (tools[].function.name):

compute_md5_of_file, compute_sha256_of_file, count_words_in_file,
create_directory, fetch_url_as_json, grep_in_files, list_directory,
parse_json_string, render_markdown, spawn_subprocess

Two things validated together:

  1. The pipeline sorts tools by function name before the OpenAI client emits the body, even though they were registered shuffled.
  2. DeepSeek's prompt_cache_hit_tokens (1152) is mapped through converter.ts into usageMetadata.cachedContentTokenCount on the Gemini-shape response, on the non-streaming path. The unit tests added in this PR cover the streaming path with the same mapping.

Notes for reviewers

  • The isDeepSeekProvider gate is broad (hostname OR model-name contains deepseek), which intentionally covers self-hosted vLLM / sglang / ollama deployments. The narrower isDeepSeekHostname gate above it (for the thinking: { type: 'disabled' } rewrite) is correctly left untouched — those are two different decisions about wire shape vs. cache-prefix stability, and the inline comment in pipeline.ts calls that out.
  • The sort uses a stable lexicographic compare on tool.function.name. The set of names registered for a given Qwen Code session is known to be unique, so collisions on the sort key aren't a concern.
  • No behavior change for non-DeepSeek OpenAI-compatible providers — the new test preserves tool order for non-DeepSeek hostnames covers that, and the live run was scoped to DeepSeek only.

Verdict

Behaves as described on Linux, end to end. LGTM from a maintainer-validation standpoint.

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorting tools by function name for DeepSeek's prefix-exact KV cache is the right approach. isDeepSeekProvider (broad detection) is correctly used here instead of the narrower isDeepSeekHostname — cache-prefix stability applies to self-hosted DeepSeek too. The getCachedPromptTokens refactor with prompt_cache_hit_tokens fallback has clean precedence: standard > top-level > DeepSeek-specific.

@BZ-D

BZ-D commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

This PR has merge conflicts with main — please rebase or merge main to resolve before merging.

@tanzhenxin

tanzhenxin commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the PR and the thorough write-up, @Jerry2003826.

I'd like to close this pending a justifying issue, because we couldn't reproduce the problem the main change targets.

Tool-ordering change: the wire tools list isn't rebuilt per turn — it's captured once by setTools() and reused verbatim on every following request. setTools() only re-runs when the tool set changes (MCP discovery completing, a deferred tool being revealed via ToolSearch, skill/subagent swaps). In each of those the prefix changes because the set changed, which reordering can't recover. So within a session, two turns with the same tool set already send byte-identical tool ordering — there's no drift for the sort to fix.

The one case where an identical set can appear in a different order is across separate process starts, where concurrent MCP discovery registers tools in completion order. That's real but narrow: multi-MCP-server setups only, only the static prefix before the first user turn, and only across sessions — not the within-session cache where most of the savings are.

Could you file an issue first with a concrete repro — ideally before/after prompt_cache_hit_tokens numbers showing tool ordering actually costs cache hits in practice? With that we can confirm the problem and the right layer to fix it (sorting the registry output once may beat a DeepSeek-only wire patch). Happy to reopen once there's a repro.

@tanzhenxin tanzhenxin closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants