feat(cli): add tools.toolSearch.enabled setting for prefix-caching models#4069
Conversation
… for prefix-caching models ToolSearch (PR #3589) defers MCP tool loading to reduce prompt size, but breaks prefix-based KV caching for models like DeepSeek V4 where cached token pricing is 1/120 of uncached. Users reported cache hit rates dropping from ~98% to ~81% and 3x cost increases (discussion #4065). Add a `tools.toolSearch.enabled` setting (default: true) that disables ToolSearch by adding tool_search to the deny list, triggering the existing eager-reveal fallback in client.ts. All deferred tools are then included in the initial declaration list, restoring prompt prefix stability. Auto-disable ToolSearch for deepseek-v4-* models when the setting is not explicitly configured, since their extreme cache discount makes prefix stability far more valuable than the ~15K token savings from deferral. Users can override with `tools.toolSearch.enabled: true`.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
pomelo-nwu
left a comment
There was a problem hiding this comment.
The design is clean — reusing the existing eager-reveal fallback in client.ts with zero core changes is exactly the right approach. A few issues to address before merge:
🔴 Must-fix: lint failure
The CI lint step fails on "Check settings schema is up-to-date". After modifying settingsSchema.ts, you need to regenerate packages/vscode-ide-companion/schemas/settings.schema.json:
npm run generate:settings-schemaCommit the updated JSON schema file and the lint check will pass.
🟡 Missing tests
Only 2 files changed, no test files. The PR body mentions running existing config/schema tests, but there are no tests for the new logic. Please add coverage for at least these cases in config.test.ts:
tools.toolSearch.enabled: false→tool_searchappears in deny listdeepseek-v4-*model (e.g.deepseek-v4-flash) →tool_searchauto-added to deny listenabled: trueexplicitly overrides the auto-detection for deepseek models- Non-deepseek model (e.g.
qwen) does NOT trigger auto-disable
🟡 Auto-detection scope
Two questions on the regex /^deepseek-v4/i:
-
Model name prefix —
resolvedModelmay include a provider prefix likeopenrouter/deepseek-v4-flash(depending on how the user configured it). The^anchor would miss that. Consider using/deepseek-v4/iwithout the anchor, or checking ifresolvedModelalready strips provider prefixes. -
Other DeepSeek models —
deepseek-chatanddeepseek-v3also have prefix-based KV caching on DeepSeek's API (same 10% cached pricing model). Should the auto-detection cover those too? At minimum, a comment explaining why only v4 is targeted would help future readers.
…e schema - Remove ^ anchor from regex to handle provider-prefixed model names (e.g. openrouter/deepseek/deepseek-chat) - Expand auto-detection to all DeepSeek models with prefix caching: deepseek-v3, deepseek-v4-*, deepseek-chat - Add 6 tests covering: explicit disable, auto-detect for v3/v4/chat with provider prefix, non-deepseek skip, explicit enable override - Regenerate settings.schema.json for vscode-ide-companion
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅
Reviewed: 4 files / +110 lines. The tools.toolSearch.enabled configuration is well-implemented with comprehensive test coverage (6 test cases covering explicit false, deepseek-v3/v4/chat auto-detection with provider prefix, non-deepseek models, and explicit override). Schema definition is consistent across settingsSchema.ts and settings.schema.json.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
…dels (#4069) * feat(cli): add tools.toolSearch.enabled setting to disable ToolSearch for prefix-caching models ToolSearch (PR #3589) defers MCP tool loading to reduce prompt size, but breaks prefix-based KV caching for models like DeepSeek V4 where cached token pricing is 1/120 of uncached. Users reported cache hit rates dropping from ~98% to ~81% and 3x cost increases (discussion #4065). Add a `tools.toolSearch.enabled` setting (default: true) that disables ToolSearch by adding tool_search to the deny list, triggering the existing eager-reveal fallback in client.ts. All deferred tools are then included in the initial declaration list, restoring prompt prefix stability. Auto-disable ToolSearch for deepseek-v4-* models when the setting is not explicitly configured, since their extreme cache discount makes prefix stability far more valuable than the ~15K token savings from deferral. Users can override with `tools.toolSearch.enabled: true`. * fix: address PR review — expand model detection, add tests, regenerate schema - Remove ^ anchor from regex to handle provider-prefixed model names (e.g. openrouter/deepseek/deepseek-chat) - Expand auto-detection to all DeepSeek models with prefix caching: deepseek-v3, deepseek-v4-*, deepseek-chat - Add 6 tests covering: explicit disable, auto-detect for v3/v4/chat with provider prefix, non-deepseek skip, explicit enable override - Regenerate settings.schema.json for vscode-ide-companion
Summary
tools.toolSearch.enabledsetting to allow disabling ToolSearch, which restores prompt prefix stability for models with prefix-based KV cachingdeepseek-v3,deepseek-v4-*,deepseek-chat), unless explicitly overriddenopenrouter/deepseek/deepseek-chat)client.ts— no changes needed to core logicCloses #4065
Context
ToolSearch (PR #3589) defers MCP tool loading to reduce prompt size (~15K tokens saved), but breaks prefix-based KV caching for providers like DeepSeek. Users reported:
How it works
Usage
{ "tools": { "toolSearch": { "enabled": false } } }For DeepSeek models, no configuration needed — auto-disabled by default. Override with
"enabled": trueif desired.Test plan
vitest run src/config/settingsSchema.test.ts— 17 passedvitest run src/config/config.test.ts— 207 passed (6 new), 2 skippedsettings.schema.json)Manual verification — tool list
Tested with
DeepSeek/deepseek-v4-proconfigured in~/.qwen/settings.json:DeepSeek/deepseek-v4-pro(default)monitor,task_stop,send_message,exit_plan_mode, etc.)qwen3-coder-plus(--model)Cache hit rate verification
Real API calls to
DeepSeek/deepseek-v4-provia DeepSeek API, 3 independent single-turn sessions each:ToolSearch DISABLED (this PR, auto-detect):
ToolSearch ENABLED (override with
enabled: true):With ToolSearch disabled, cache hit rate is consistently ~98% from the first request. With ToolSearch enabled, the first request only hits 44.8% because the tool declarations vary, and it takes multiple requests for the cache to stabilize.
At DeepSeek V4's 1/120 cache pricing, the 26% cache hit improvement translates to significant cost savings for heavy users.
🤖 Generated with Claude Code