Skip to content

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196

Open
wplll wants to merge 5 commits intoHmbown:mainfrom
wplll:feat/deepseek-cache-inspect
Open

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196
wplll wants to merge 5 commits intoHmbown:mainfrom
wplll:feat/deepseek-cache-inspect

Conversation

@wplll
Copy link
Copy Markdown

@wplll wplll commented May 8, 2026

Summary


This PR adds DeepSeek prompt cache awareness to the TUI and introduces a new /cache inspect command for diagnosing cache-related prompt structure.

The main goal is to make DeepSeek context caching easier to reason about by separating stable reusable prompt prefixes from session history and dynamic request content.


Changes



DeepSeek cache-aware prompt structure



  • Introduced a cache-oriented prompt layering model.
  • Separates prompt content into clearer categories:
  • static base prefix
  • session history
  • dynamic request content
  • Keeps reusable prompt components at the front of the request where they are most likely to benefit from DeepSeek context caching.
    

Cache usage metrics



  • Parses DeepSeek cache usage fields when available:
  • prompt_cache_hit_tokens
  • prompt_cache_miss_tokens
  • Computes and displays cache hit information for recent requests.
  • Handles providers or responses without cache fields gracefully.
    

/cache inspect



  • Adds a /cache inspect command to inspect the rendered prompt structure without printing the full prompt text.
  • Displays SHA-256 hashes for rendered prompt layers.
  • Adds:
  • Base static prefix hash
  • Full request prefix hash
  • static prefix stability status
  • first divergence from the previous request
  • Classifies rendered layers as:
  • static
  • history
  • dynamic
  • Helps distinguish expected request-history changes from actual static-prefix instability.
    

Safety and privacy



  • /cache inspect does not print full prompt contents by default.
  • Hashes are used for diagnostics instead of exposing full rendered prompt text.
    

Motivation


DeepSeek context caching can significantly reduce input cost when requests share a stable prefix. However, before this change it was difficult to tell whether cache misses were caused by changes in the reusable static prefix or by normal conversation history growth.

This PR adds both the cache-aware prompt design and the inspection tooling needed to debug cache behavior in real sessions.

In particular, /cache inspect makes it possible to verify that the static base prefix remains stable across turns while allowing the full rendered request to change as history, tool results, and user inputs evolve.


Example behavior


Across multiple turns in the same session, /cache inspect can now show:


  • the same Base static prefix hash
  • different Full request prefix hash
  • Static base prefix stability: OK
  • the first changed layer in the session history or dynamic request area
    
    This makes it easier to confirm that the reusable DeepSeek cache prefix is stable even when the full request changes.
    

Testing


Manual verification performed:


  • Asked multiple questions in the same session.
  • Ran /cache inspect after each request.
  • Confirmed Base static prefix hash remains stable across turns.
  • Confirmed Full request prefix hash changes as history grows.
  • Confirmed cache hit / miss metrics are displayed when DeepSeek returns cache usage fields.
  • Confirmed full prompt text is not printed by /cache inspect.
    

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR implements a "Project Context Pack" feature that creates a deterministic workspace summary for the system prompt, alongside new /cache inspect and /cache warmup debug commands for analyzing prompt stability and priming provider caches. It also enhances the TUI footer with detailed cache telemetry. Feedback recommends expanding the ignored directory list (e.g., target, .vscode) and increasing the variety of recognized configuration and source file extensions to improve the context pack's efficiency and coverage.

Comment thread crates/tui/src/project_context.rs
Comment thread crates/tui/src/project_context.rs
Comment thread crates/tui/src/project_context.rs
@wplll
Copy link
Copy Markdown
Author

wplll commented May 8, 2026

Tool result budget and deduplication

Adds wire-only budgeting for tool result messages sent to DeepSeek.

Large tool outputs are now compacted before they enter the rendered API request while preserving the full output in the local UI/session state. By default, a single tool result keeps up to 12,000 chars, including:

  • tool name
  • command/query
  • exit status
  • original char count
  • SHA-256 hash
  • first 4,000 chars
  • last 4,000 chars
  • an explicit truncation placeholder for omitted middle content

Repeated identical tool results are deduplicated in the rendered wire history using a compact stable reference instead of resending the full output.

This reduces cache miss pressure from large repeated dynamic tool messages without deleting local logs or changing the visible transcript.

/cache inspect now also reports tool result budget metadata:

  • original chars
  • sent chars
  • truncated: true/false
  • deduplicated: true/false

Turn metadata deduplication

Adds wire-only deduplication for repeated <turn_meta> blocks.

The first rendered <turn_meta> block is kept in full. If a later <turn_meta> block is identical to the most recent full one, the rendered API request replaces it with a stable reference:

<TURN_META_REF sha="..." original_chars="..." />

If the metadata changes, the full block is sent again and becomes the new comparison point.

This keeps repeated per-turn metadata from inflating multi-turn request payloads while preserving:

  • full UI transcript content
  • original session history
  • local saved session messages

/cache inspect now reports turn metadata diagnostics:

  • turn_meta original chars
  • turn_meta sent chars
  • turn_meta deduplicated: true/false
  • turn_meta sha256

Additional automated verification:

  • Added tests that oversized tool outputs are truncated only in rendered wire messages.
  • Added tests that identical tool outputs are replaced by compact references on repeated use.
  • Added tests that local/session tool output remains unchanged.
  • Added tests that repeated <turn_meta> blocks are replaced by stable refs in rendered wire messages.
  • Added tests that changed <turn_meta> blocks are sent in full.
  • Added tests that original session messages are not mutated by <turn_meta> deduplication.
  • Added tests that /cache inspect displays tool result and turn metadata budget/dedup diagnostics.

Commands run:

cargo fmt --check
cargo check
cargo clippy --workspace --all-targets --all-features
cargo test -p deepseek-tui turn_meta
cargo test -p deepseek-tui cache_inspect
cargo test

cargo test currently still has existing Python/REPL runtime failures unrelated to this PR’s cache rendering changes; the new cache/tool/turn_meta tests pass.

@Hmbown Hmbown mentioned this pull request May 9, 2026
4 tasks
LiangJianJi pushed a commit to LiangJianJi/DeepSeek-TUI that referenced this pull request May 9, 2026
…tion (Hmbown#1196)

Merge of PR Hmbown#1196 by wplll. Adds:

Cache-aware prompt layering:
  - PromptBuilder struct separates prompt construction from inspection
  - System prompt split into named layers with stability classification
  - Layers classified as static/history/dynamic for cache debugging

/cache inspect command:
  - SHA-256 hashes of each rendered prompt layer
  - Base static prefix hash vs full request prefix hash
  - Static prefix stability status across turns
  - First-divergence tracking from previous request

Wire payload optimization:
  - Tool result budget: large outputs compacted before API request
  - Tool result dedup: repeated outputs replaced by compact refs
  - Turn metadata dedup: repeated <turn_meta> blocks deduplicated
  - Wire-only: local session messages remain unchanged

Project context pack:
  - Deterministic workspace summary injected into stable prefix
  - Configurable via [context] project_pack = false

Cache warmup and improved footer cache display.
Thanks to wplll for the contribution.
LiangJianJi pushed a commit to LiangJianJi/DeepSeek-TUI that referenced this pull request May 9, 2026
…fault

CHANGELOG additions:
- Top-line credit summary: wplll, Liu-Vince, Giggitycountless,
  SamhandsomeLee, barjatiyasaurabh, tyculw, hongyuatcufe, ljlbit.
- New "Added" section properly documenting Hmbown#1196 (cache-aware
  diagnostics, /cache inspect, /cache warmup, payload optimization,
  Project Context Pack). Calls out that the Pack is default-on, adds
  ~1–10 KB to every prompt, and how to opt out via
  [context] project_pack = false.
- Per-item issue reporter credits across the Fixed section.
- Removed Hmbown#1129 from the i18n entry — that's a separate bug we did
  not actually fix (wrong env var name in HTTP system prompt).

README updates: rewrote the "What's New" section in both README.md
and README.zh-CN.md to v0.8.24 with all the same credits and the
project_pack opt-out note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant