Skip to content

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196

Closed
wplll wants to merge 5 commits into
Hmbown:mainfrom
wplll:feat/deepseek-cache-inspect
Closed

Add DeepSeek cache-aware prompt design and /cache inspect diagnostics#1196
wplll wants to merge 5 commits into
Hmbown:mainfrom
wplll:feat/deepseek-cache-inspect

Conversation

@wplll

@wplll wplll commented May 8, 2026

Copy link
Copy Markdown

Summary


This PR adds DeepSeek prompt cache awareness to the TUI and introduces a new /cache inspect command for diagnosing cache-related prompt structure.

The main goal is to make DeepSeek context caching easier to reason about by separating stable reusable prompt prefixes from session history and dynamic request content.


Changes



DeepSeek cache-aware prompt structure



  • Introduced a cache-oriented prompt layering model.
  • Separates prompt content into clearer categories:
  • static base prefix
  • session history
  • dynamic request content
  • Keeps reusable prompt components at the front of the request where they are most likely to benefit from DeepSeek context caching.
    

Cache usage metrics



  • Parses DeepSeek cache usage fields when available:
  • prompt_cache_hit_tokens
  • prompt_cache_miss_tokens
  • Computes and displays cache hit information for recent requests.
  • Handles providers or responses without cache fields gracefully.
    

/cache inspect



  • Adds a /cache inspect command to inspect the rendered prompt structure without printing the full prompt text.
  • Displays SHA-256 hashes for rendered prompt layers.
  • Adds:
  • Base static prefix hash
  • Full request prefix hash
  • static prefix stability status
  • first divergence from the previous request
  • Classifies rendered layers as:
  • static
  • history
  • dynamic
  • Helps distinguish expected request-history changes from actual static-prefix instability.
    

Safety and privacy



  • /cache inspect does not print full prompt contents by default.
  • Hashes are used for diagnostics instead of exposing full rendered prompt text.
    

Motivation


DeepSeek context caching can significantly reduce input cost when requests share a stable prefix. However, before this change it was difficult to tell whether cache misses were caused by changes in the reusable static prefix or by normal conversation history growth.

This PR adds both the cache-aware prompt design and the inspection tooling needed to debug cache behavior in real sessions.

In particular, /cache inspect makes it possible to verify that the static base prefix remains stable across turns while allowing the full rendered request to change as history, tool results, and user inputs evolve.


Example behavior


Across multiple turns in the same session, /cache inspect can now show:


  • the same Base static prefix hash
  • different Full request prefix hash
  • Static base prefix stability: OK
  • the first changed layer in the session history or dynamic request area
    
    This makes it easier to confirm that the reusable DeepSeek cache prefix is stable even when the full request changes.
    

Testing


Manual verification performed:


  • Asked multiple questions in the same session.
  • Ran /cache inspect after each request.
  • Confirmed Base static prefix hash remains stable across turns.
  • Confirmed Full request prefix hash changes as history grows.
  • Confirmed cache hit / miss metrics are displayed when DeepSeek returns cache usage fields.
  • Confirmed full prompt text is not printed by /cache inspect.
    

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR implements a "Project Context Pack" feature that creates a deterministic workspace summary for the system prompt, alongside new /cache inspect and /cache warmup debug commands for analyzing prompt stability and priming provider caches. It also enhances the TUI footer with detailed cache telemetry. Feedback recommends expanding the ignored directory list (e.g., target, .vscode) and increasing the variety of recognized configuration and source file extensions to improve the context pack's efficiency and coverage.

Comment thread crates/tui/src/project_context.rs
Comment thread crates/tui/src/project_context.rs
Comment thread crates/tui/src/project_context.rs
@wplll

wplll commented May 8, 2026

Copy link
Copy Markdown
Author

Tool result budget and deduplication

Adds wire-only budgeting for tool result messages sent to DeepSeek.

Large tool outputs are now compacted before they enter the rendered API request while preserving the full output in the local UI/session state. By default, a single tool result keeps up to 12,000 chars, including:

  • tool name
  • command/query
  • exit status
  • original char count
  • SHA-256 hash
  • first 4,000 chars
  • last 4,000 chars
  • an explicit truncation placeholder for omitted middle content

Repeated identical tool results are deduplicated in the rendered wire history using a compact stable reference instead of resending the full output.

This reduces cache miss pressure from large repeated dynamic tool messages without deleting local logs or changing the visible transcript.

/cache inspect now also reports tool result budget metadata:

  • original chars
  • sent chars
  • truncated: true/false
  • deduplicated: true/false

Turn metadata deduplication

Adds wire-only deduplication for repeated <turn_meta> blocks.

The first rendered <turn_meta> block is kept in full. If a later <turn_meta> block is identical to the most recent full one, the rendered API request replaces it with a stable reference:

<TURN_META_REF sha="..." original_chars="..." />

If the metadata changes, the full block is sent again and becomes the new comparison point.

This keeps repeated per-turn metadata from inflating multi-turn request payloads while preserving:

  • full UI transcript content
  • original session history
  • local saved session messages

/cache inspect now reports turn metadata diagnostics:

  • turn_meta original chars
  • turn_meta sent chars
  • turn_meta deduplicated: true/false
  • turn_meta sha256

Additional automated verification:

  • Added tests that oversized tool outputs are truncated only in rendered wire messages.
  • Added tests that identical tool outputs are replaced by compact references on repeated use.
  • Added tests that local/session tool output remains unchanged.
  • Added tests that repeated <turn_meta> blocks are replaced by stable refs in rendered wire messages.
  • Added tests that changed <turn_meta> blocks are sent in full.
  • Added tests that original session messages are not mutated by <turn_meta> deduplication.
  • Added tests that /cache inspect displays tool result and turn metadata budget/dedup diagnostics.

Commands run:

cargo fmt --check
cargo check
cargo clippy --workspace --all-targets --all-features
cargo test -p deepseek-tui turn_meta
cargo test -p deepseek-tui cache_inspect
cargo test

cargo test currently still has existing Python/REPL runtime failures unrelated to this PR’s cache rendering changes; the new cache/tool/turn_meta tests pass.

@Hmbown

Hmbown commented May 23, 2026

Copy link
Copy Markdown
Owner

This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current main and reopen. 鲸鱼兄弟们等你 🐋

@Hmbown Hmbown closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants