Add DeepSeek cache-aware prompt design and /cache inspect diagnostics by wplll · Pull Request #1196 · Hmbown/DeepSeek-TUI

wplll · 2026-05-08T10:54:28Z

Summary

This PR adds DeepSeek prompt cache awareness to the TUI and introduces a new /cache inspect command for diagnosing cache-related prompt structure.

The main goal is to make DeepSeek context caching easier to reason about by separating stable reusable prompt prefixes from session history and dynamic request content.

Changes

DeepSeek cache-aware prompt structure

Introduced a cache-oriented prompt layering model.
Separates prompt content into clearer categories:
static base prefix
session history
dynamic request content
Keeps reusable prompt components at the front of the request where they are most likely to benefit from DeepSeek context caching.

Cache usage metrics

Parses DeepSeek cache usage fields when available:
prompt_cache_hit_tokens
prompt_cache_miss_tokens
Computes and displays cache hit information for recent requests.
Handles providers or responses without cache fields gracefully.

`/cache inspect`

Adds a /cache inspect command to inspect the rendered prompt structure without printing the full prompt text.
Displays SHA-256 hashes for rendered prompt layers.
Adds:
Base static prefix hash
Full request prefix hash
static prefix stability status
first divergence from the previous request
Classifies rendered layers as:
static
history
dynamic
Helps distinguish expected request-history changes from actual static-prefix instability.

Safety and privacy

/cache inspect does not print full prompt contents by default.
Hashes are used for diagnostics instead of exposing full rendered prompt text.

Motivation

DeepSeek context caching can significantly reduce input cost when requests share a stable prefix. However, before this change it was difficult to tell whether cache misses were caused by changes in the reusable static prefix or by normal conversation history growth.

This PR adds both the cache-aware prompt design and the inspection tooling needed to debug cache behavior in real sessions.

In particular, /cache inspect makes it possible to verify that the static base prefix remains stable across turns while allowing the full rendered request to change as history, tool results, and user inputs evolve.

Example behavior

Across multiple turns in the same session, /cache inspect can now show:

the same Base static prefix hash
different Full request prefix hash
Static base prefix stability: OK
the first changed layer in the session history or dynamic request area

This makes it easier to confirm that the reusable DeepSeek cache prefix is stable even when the full request changes.

Testing

Manual verification performed:

Asked multiple questions in the same session.
Ran /cache inspect after each request.
Confirmed Base static prefix hash remains stable across turns.
Confirmed Full request prefix hash changes as history grows.
Confirmed cache hit / miss metrics are displayed when DeepSeek returns cache usage fields.
Confirmed full prompt text is not printed by /cache inspect.

gemini-code-assist

Code Review

This PR implements a "Project Context Pack" feature that creates a deterministic workspace summary for the system prompt, alongside new /cache inspect and /cache warmup debug commands for analyzing prompt stability and priming provider caches. It also enhances the TUI footer with detailed cache telemetry. Feedback recommends expanding the ignored directory list (e.g., target, .vscode) and increasing the variety of recognized configuration and source file extensions to improve the context pack's efficiency and coverage.

wplll · 2026-05-08T17:02:14Z

Tool result budget and deduplication

Adds wire-only budgeting for tool result messages sent to DeepSeek.

Large tool outputs are now compacted before they enter the rendered API request while preserving the full output in the local UI/session state. By default, a single tool result keeps up to 12,000 chars, including:

tool name
command/query
exit status
original char count
SHA-256 hash
first 4,000 chars
last 4,000 chars
an explicit truncation placeholder for omitted middle content

Repeated identical tool results are deduplicated in the rendered wire history using a compact stable reference instead of resending the full output.

This reduces cache miss pressure from large repeated dynamic tool messages without deleting local logs or changing the visible transcript.

/cache inspect now also reports tool result budget metadata:

original chars
sent chars
truncated: true/false
deduplicated: true/false

Turn metadata deduplication

Adds wire-only deduplication for repeated <turn_meta> blocks.

The first rendered <turn_meta> block is kept in full. If a later <turn_meta> block is identical to the most recent full one, the rendered API request replaces it with a stable reference:

<TURN_META_REF sha="..." original_chars="..." />

If the metadata changes, the full block is sent again and becomes the new comparison point.

This keeps repeated per-turn metadata from inflating multi-turn request payloads while preserving:

full UI transcript content
original session history
local saved session messages

/cache inspect now reports turn metadata diagnostics:

turn_meta original chars
turn_meta sent chars
turn_meta deduplicated: true/false
turn_meta sha256

Additional automated verification:

Added tests that oversized tool outputs are truncated only in rendered wire messages.
Added tests that identical tool outputs are replaced by compact references on repeated use.
Added tests that local/session tool output remains unchanged.
Added tests that repeated <turn_meta> blocks are replaced by stable refs in rendered wire messages.
Added tests that changed <turn_meta> blocks are sent in full.
Added tests that original session messages are not mutated by <turn_meta> deduplication.
Added tests that /cache inspect displays tool result and turn metadata budget/dedup diagnostics.

Commands run:

cargo fmt --check
cargo check
cargo clippy --workspace --all-targets --all-features
cargo test -p deepseek-tui turn_meta
cargo test -p deepseek-tui cache_inspect
cargo test

cargo test currently still has existing Python/REPL runtime failures unrelated to this PR’s cache rendering changes; the new cache/tool/turn_meta tests pass.

…tion (Hmbown#1196) Merge of PR Hmbown#1196 by wplll. Adds: Cache-aware prompt layering: - PromptBuilder struct separates prompt construction from inspection - System prompt split into named layers with stability classification - Layers classified as static/history/dynamic for cache debugging /cache inspect command: - SHA-256 hashes of each rendered prompt layer - Base static prefix hash vs full request prefix hash - Static prefix stability status across turns - First-divergence tracking from previous request Wire payload optimization: - Tool result budget: large outputs compacted before API request - Tool result dedup: repeated outputs replaced by compact refs - Turn metadata dedup: repeated <turn_meta> blocks deduplicated - Wire-only: local session messages remain unchanged Project context pack: - Deterministic workspace summary injected into stable prefix - Configurable via [context] project_pack = false Cache warmup and improved footer cache display. Thanks to wplll for the contribution.

…fault CHANGELOG additions: - Top-line credit summary: wplll, Liu-Vince, Giggitycountless, SamhandsomeLee, barjatiyasaurabh, tyculw, hongyuatcufe, ljlbit. - New "Added" section properly documenting Hmbown#1196 (cache-aware diagnostics, /cache inspect, /cache warmup, payload optimization, Project Context Pack). Calls out that the Pack is default-on, adds ~1–10 KB to every prompt, and how to opt out via [context] project_pack = false. - Per-item issue reporter credits across the Fixed section. - Removed Hmbown#1129 from the i18n entry — that's a separate bug we did not actually fix (wrong env var name in HTTP system prompt). README updates: rewrote the "What's New" section in both README.md and README.zh-CN.md to v0.8.24 with all the same credits and the project_pack opt-out note.

improve cache inspect diagnostics

a3ad442

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread crates/tui/src/project_context.rs

Comment thread crates/tui/src/project_context.rs

Comment thread crates/tui/src/project_context.rs

wplll and others added 4 commits May 8, 2026 18:59

fix: resolve debug command merge conflict

634b062

Merge branch 'main' into feat/deepseek-cache-inspect

f95a7be

chore(cache): refine project context filters

cb8fd18

perf(tui): deduplicate repeated turn metadata for DeepSeek cache

c1309bc

This was referenced May 8, 2026

输入缓存命中率太低了 #1177

Open

Feature: DeepSeek cache-aware prompt diagnostics and wire payload optimization #1253

Open

Hmbown mentioned this pull request May 9, 2026

chore(release): prepare v0.8.24 #1283

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196
wplll wants to merge 5 commits intoHmbown:mainfrom
wplll:feat/deepseek-cache-inspect

wplll commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wplll commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wplll commented May 8, 2026

Summary

Changes

DeepSeek cache-aware prompt structure

Cache usage metrics

/cache inspect

Safety and privacy

Motivation

Example behavior

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wplll commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`/cache inspect`