DeepSeek V4 Flash cache hit rate dropped from ~98% to ~81% after v0.15.10 (ToolSearch) #4065

TradingLaboratory · 2026-05-11T13:50:11Z

TradingLaboratory
May 11, 2026

After updating Qwen Code CLI from v0.15.9 to v0.15.10, I noticed a dramatic drop in DeepSeek V4 Flash cached token ratio through OpenRouter. Here's the data:

Before v0.15.10 (9 days of data):

Date	Input tokens	Cached tokens	Uncached tokens	Cache %	Spend ($)	Qwen ver
May 1	1.4M	1.2M	0.2M	87.9%	0.03	v0.15.6
May 2	94.7M	93.4M	1.3M	98.5%	0.58	v0.15.6
May 3	240.3M	238.5M	1.8M	99.3%	1.21	v0.15.6
May 4	159.1M	156.4M	2.7M	98.3%	0.96	v0.15.6
May 5	154.2M	151.8M	2.4M	98.5%	1.15	v0.15.6
May 6	202.2M	195.3M	6.9M	96.6%	1.89	v0.15.6
May 7	124.3M	121.4M	2.9M	97.7%	0.99	v0.15.7
May 8	160.4M	155.1M	5.3M	96.7%	1.37	v0.15.8
May 9	27.0M	26.5M	0.5M	98.4%	0.23	v0.15.9
Average 1-9	129M	126M	~3M	~97.5%	$1.05	—

After v0.15.10:

Date	Input tokens	Cached tokens	Uncached tokens	Cache %	Spend ($)	Qwen ver
May 10	69.4M	56.5M	12.9M	81.5%	$3.30	v0.15.10

Impact summary

Metric	Average May 1-9	May 10	Change
Cache hit rate	97.5%	81.5%	−16.4% (absolute)
Uncached tokens per day	~3M	12.9M	+330%
Daily spend	$1.05	$3.30	+214%
Input tokens per day	129M	69.4M	−46% (less work!)

Despite processing less than half the tokens compared to my daily average, I paid 3× more — from $1.05 to $3.30 per day. The uncached tokens skyrocketed from ~3M to 12.9M, a 4.3× increase. This is a significant real-world cost impact.

Root cause analysis

The likely culprit is ToolSearch (PR #3589) introduced in v0.15.10.

Here's the mechanism:

Before v0.15.9: All MCP tool declarations were embedded directly in the system prompt at the start of every request. This created a stable, identical prefix across all requests within a session.
DeepSeek's caching model: DeepSeek uses prefix-based KV caching — it caches the beginning of the prompt on disk. If a subsequent request starts with the exact same prefix, the cached portion is reused at 10% of the original cost. This requires byte-identical prefix matching.
What ToolSearch changes: PR feat(tools): add ToolSearch for on-demand loading of deferred tool schemas #3589 defers tool loading — MCP tools are now loaded on-demand via a ToolSearch call instead of being declared upfront. This means each request may have a different set of loaded tools in its prompt prefix, breaking the prefix stability.
The result: The prompt prefix changes between requests → DeepSeek's prefix-based cache misses → most tokens are billed at full uncached rate → cost spikes.

This is a trade-off: ToolSearch saves ~15K tokens per request in prompt size, but at the cost of breaking prefix-based caching for models like DeepSeek that rely on stable prefixes. For heavy users of DeepSeek through OpenRouter, the savings from smaller prompts are dwarfed by the increased cost from cache misses.

Has anyone else observed this? Is there a way to disable ToolSearch or keep tool declarations stable for models that benefit from prefix caching?

pomelo-nwu · 2026-05-12T01:28:33Z

pomelo-nwu
May 12, 2026
Maintainer

👋 @TradingLaboratory Thanks for this incredibly thorough analysis — the data is very compelling.

Your root cause diagnosis is correct. ToolSearch changes how tool declarations appear in the system prompt: instead of being stably inlined in the prefix for every request, they are now loaded on-demand, which breaks DeepSeek's prefix-based KV caching.

There is currently no option to disable ToolSearch. We'll add a config option (e.g. toolSearch: false) that restores the old behavior of inlining all tool declarations directly into the system prompt prefix, keeping it stable across requests for models that benefit from prefix caching.

I'll track this in an issue. Thanks again for the detailed report and cost data — this is really helpful.

0 replies

pomelo-nwu · 2026-05-12T04:54:37Z

pomelo-nwu
May 12, 2026
Maintainer

@TradingLaboratory #4069

1 reply

TradingLaboratory May 12, 2026
Author

@pomelo-nwu Thank you for the fast turnaround and the clear plan! Really great to see the team act on community feedback so quickly. The toolSearch: false option is exactly what I was hoping for. 🚀

Happy to test or benchmark once it's ready. Let's keep making Qwen Code even better together!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek V4 Flash cache hit rate dropped from ~98% to ~81% after v0.15.10 (ToolSearch) #4065

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DeepSeek V4 Flash cache hit rate dropped from ~98% to ~81% after v0.15.10 (ToolSearch) #4065

Uh oh!

TradingLaboratory May 11, 2026

Impact summary

Root cause analysis

Replies: 2 comments · 1 reply

Uh oh!

pomelo-nwu May 12, 2026 Maintainer

Uh oh!

pomelo-nwu May 12, 2026 Maintainer

Uh oh!

TradingLaboratory May 12, 2026 Author

TradingLaboratory
May 11, 2026

Replies: 2 comments 1 reply

pomelo-nwu
May 12, 2026
Maintainer

pomelo-nwu
May 12, 2026
Maintainer

TradingLaboratory May 12, 2026
Author