feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss by Hmbown · Pull Request #278 · Hmbown/CodeWhale

Hmbown · 2026-05-02T00:54:05Z

Summary

Step 1 of #263 — without per-turn cache telemetry on screen the prefix-cache audit is unfounded speculation. This adds the foundation.

The DeepSeek API already returns `prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` per turn, and we already store the latest values on `App`. This adds:

`App::turn_cache_history: VecDeque` — 50-turn ring populated at the same site as `last_prompt_cache_*_tokens` (`tui/ui.rs:716`)
`/cache [count]` slash command (default count 10, clamps to ring size) that renders a fixed-width table the user can paste into a bug report:

```
Cache telemetry — last 4 of 4 turn(s) (model: deepseek-v4-pro)
────────────────────────────────────────────────────────────────────────────
turn in out hit miss replay ratio age
────────────────────────────────────────────────────────────────────────────
1 4000 200 3000 1000 — 75.0% 0s
2 6000 250 3000 3000 150 50.0% 0s
3 5000 100 2500 2500* — 50.0% 0s
4 1000 50 — — — — 0s
────────────────────────────────────────────────────────────────────────────
Σ in: 16000 Σ hit: 8500 Σ miss: 6500 avg hit ratio: 56.7%

miss inferred from input − hit when the provider did not report it explicitly.
Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; …
```

Edge cases handled by the formatter

No telemetry yet → friendly "no turns recorded" message instead of an empty table
`cache_hit_tokens = None` (provider didn't report) → row renders em-dashes and is excluded from aggregates, so one missing-telemetry turn can't make the average look broken
`cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark with `*`; footer documents the asterisk
Ring at cap (50) → push evicts oldest

All four paths are covered by tests; the regression test `turn_cache_history_is_capped_at_50` pins the cap.

What this unlocks

With per-turn telemetry on screen, step 2 of the audit (byte-diff harness) can be measurably driven. Step 3 (suspect-by-suspect bisection) can verify each fix with `/cache` showing the ratio jump.

Test plan

`cargo test -p deepseek-tui --bin deepseek-tui --locked` (1742/1744 — 2 ignored unrelated)
`cargo fmt --all -- --check`
`cargo clippy -p deepseek-tui --all-targets --locked -- -D warnings`

🤖 Generated with Claude Code

…/miss Step 1 of #263. Without per-turn telemetry the prefix-cache audit is unfounded speculation; the rest of the issue's investigation steps depend on this surface. The DeepSeek API already returns `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` per turn, and we already store the *latest* on App. This adds a 50-turn ring (`turn_cache_history`) populated at the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]` slash command that renders a fixed-width table of the last N turns with per-turn ratios and a session aggregate. Default count is 10; larger values clamp to the ring size. Edge cases the formatter handles: - No telemetry yet → friendly "no turns recorded" message - `cache_hit_tokens = None` (provider didn't report) → row renders all em-dashes and is excluded from session aggregates so one missing- telemetry turn can't make the average ratio look broken. - `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark the cell with `*`. Footer documents the asterisk. - Ring at cap (50) → push evicts oldest. Tests cover all four paths plus the cap.

gemini-code-assist

Code Review

This pull request introduces a new /cache debug command to display per-turn DeepSeek prefix-cache telemetry. The changes include a new TurnCacheRecord struct, a capped history buffer in the application state, and logic to format the telemetry into a table. Review feedback identifies several alignment and scalability issues in the table rendering, specifically recommending increased column widths for large token counts and consistent padding for ratio strings and separators.

gemini-code-assist · 2026-05-02T00:56:44Z

+    header.push_str("turn   in    out   hit   miss   replay   ratio   age\n");
+    header.push_str(&"─".repeat(76));
+    header.push('\n');


The table header and separator are misaligned with the row format string. Additionally, the column widths for token counts (in, out, hit, miss, replay) are set to 5 or 6, which is insufficient for models with large context windows (e.g., DeepSeek-V3's 128k context results in 6-digit token counts). Using a consistent format! call for the header and increasing column widths to 7 characters ensures alignment and provides better headroom.

Suggested change

header.push_str("turn in out hit miss replay ratio age\n");

header.push_str(&"─".repeat(76));

header.push('\n');

header.push_str(&format!(

"{:>4} {:>7} {:>7} {:>7} {:>7} {:>7} {:>6} {}\n",

"turn", "in", "out", "hit", "miss", "replay", "ratio", "age"

));

header.push_str(&"─".repeat(64));

header.push('\n');

gemini-code-assist · 2026-05-02T00:56:45Z

+        totals_input += u64::from(rec.input_tokens);
+
+        let replay_cell = rec
+            .reasoning_replay_tokens
+            .map_or_else(|| "—".to_string(), |t| t.to_string());
+        let age = humanize_age(now.saturating_duration_since(rec.recorded_at));
+
+        // No cache telemetry → render `—` everywhere and don't pollute totals
+        // with inferred zeros. Some providers (and some routes inside DeepSeek)
+        // skip the cache fields; including a synthesized 0/N for those turns
+        // would make every aggregate ratio look broken.
+        let Some(hit) = rec.cache_hit_tokens else {
+            body.push_str(&format!(
+                "{turn:>4}  {input:>5}  {output:>5}  {hit:>5}  {miss:>5}  {replay:>6}   {ratio:>6}   {age}\n",
+                turn = turn_index,
+                input = rec.input_tokens,
+                output = rec.output_tokens,
+                hit = "—",
+                miss = "—",
+                replay = replay_cell,
+                ratio = "—",
+                age = age,
+            ));
+            continue;
+        };
+
+        let miss_reported = rec.cache_miss_tokens;
+        let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit));
+        let accounted = u64::from(hit) + u64::from(miss);
+        let ratio = if accounted == 0 {
+            "    —".to_string()
+        } else {
+            format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
+        };
+        totals_hit += u64::from(hit);
+        totals_miss += u64::from(miss);
+
+        let miss_cell = match miss_reported {
+            Some(_) => format!("{miss}"),
+            None => format!("{miss}*"),
+        };
+
+        body.push_str(&format!(
+            "{turn:>4}  {input:>5}  {output:>5}  {hit:>5}  {miss:>5}  {replay:>6}   {ratio}   {age}\n",
+            turn = turn_index,
+            input = rec.input_tokens,
+            output = rec.output_tokens,
+            hit = hit,
+            miss = miss_cell,
+            replay = replay_cell,
+            ratio = ratio,
+            age = age,
+        ));


There are several issues in the row rendering logic:

totals_input is incremented for every turn, including those without telemetry. This makes the footer Σ in inconsistent with Σ hit + Σ miss, which is confusing when auditing cache performance. It should only sum turns that contribute to the telemetry aggregates.

The ratio string for the accounted == 0 case (line 206) is 5 characters long (" —"), while the numeric ratio (line 208) and the "no telemetry" case (line 196) are 6 characters long, causing misalignment.

Column widths should be increased to 7 to accommodate larger token counts.

for (i, rec) in rows.iter().enumerate() { let turn_index = absolute_start + i + 1; let replay_cell = rec .reasoning_replay_tokens .map_or_else(|| "—".to_string(), |t| t.to_string()); let age = humanize_age(now.saturating_duration_since(rec.recorded_at)); // No cache telemetry → render `—` everywhere and don't pollute totals // with inferred zeros. Some providers (and some routes inside DeepSeek) // skip the cache fields; including a synthesized 0/N for those turns // would make every aggregate ratio look broken. let Some(hit) = rec.cache_hit_tokens else { body.push_str(&format!( "{turn:>4} {input:>7} {output:>7} {hit:>7} {miss:>7} {replay:>7} {ratio:>6} {age}\n", turn = turn_index, input = rec.input_tokens, output = rec.output_tokens, hit = "—", miss = "—", replay = replay_cell, ratio = "—", age = age, )); continue; }; totals_input += u64::from(rec.input_tokens); let miss_reported = rec.cache_miss_tokens; let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit)); let accounted = u64::from(hit) + u64::from(miss); let ratio = if accounted == 0 { " —".to_string() } else { format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64) }; totals_hit += u64::from(hit); totals_miss += u64::from(miss); let miss_cell = match miss_reported { Some(_) => format!("{miss}"), None => format!("{miss}*"), }; body.push_str(&format!( "{turn:>4} {input:>7} {output:>7} {hit:>7} {miss:>7} {replay:>7} {ratio} {age}\n", turn = turn_index, input = rec.input_tokens, output = rec.output_tokens, hit = hit, miss = miss_cell, replay = replay_cell, ratio = ratio, age = age, )); }

gemini-code-assist · 2026-05-02T00:56:45Z

+    footer.push_str(&"─".repeat(76));
+    footer.push('\n');


The separator length should be updated to match the new table width.

Suggested change

footer.push_str(&"─".repeat(76));

footer.push('\n');

let mut footer = String::new();

footer.push_str(&"─".repeat(64));

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Copilot

Pull request overview

Adds a new debug surface to make DeepSeek per-turn prefix-cache telemetry visible in the TUI, enabling measurable cache-hit auditing for issue #263.

Changes:

Record per-turn cache telemetry into a capped (50) VecDeque on App.
Add /cache [count] slash command to render recent turns as a copy/paste-friendly table with aggregates.
Register the new command in the command registry and add unit tests for edge cases and capping.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`crates/tui/src/tui/ui.rs`	Appends a `TurnCacheRecord` at turn finalization time using usage telemetry.
`crates/tui/src/tui/app.rs`	Introduces `TurnCacheRecord`, `turn_cache_history`, and a capped push helper.
`crates/tui/src/commands/mod.rs`	Registers the `/cache` command in the command list and dispatcher.
`crates/tui/src/commands/debug.rs`	Implements `/cache` rendering/formatting and adds tests for edge cases and capping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+pub fn cache(app: &mut App, arg: Option<&str>) -> CommandResult {
+    let want = arg
+        .and_then(|s| s.trim().parse::<usize>().ok())
+        .unwrap_or(10);
+    let cap = app.turn_cache_history.len();
+    let count = want
+        .min(cap)
+        .min(crate::tui::app::App::TURN_CACHE_HISTORY_CAP);
+


+        "Cache telemetry — last {} of {} turn(s) (model: {})\n",
+        rows.len(),
+        total,
+        app.model


+        body.push_str(&format!(
+            "{turn:>4}  {input:>5}  {output:>5}  {hit:>5}  {miss:>5}  {replay:>6}   {ratio}   {age}\n",
+            turn = turn_index,


+        let ratio = if accounted == 0 {
+            "    —".to_string()
+        } else {
+            format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
+        };


+    footer.push_str(&"─".repeat(76));
+    footer.push('\n');
+    footer.push_str(&format!(
+        "Σ in: {totals_input}   Σ hit: {totals_hit}   Σ miss: {totals_miss}   avg hit ratio: {avg_ratio}\n",


+        "* miss inferred from input − hit when the provider did not report it explicitly.\n",
+    );
+    footer.push_str(
+        "Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; \n\


+    ///   V4-thinking tool-calling turns (chars/3 heuristic). Helps separate
+    ///   cache misses caused by reasoning-replay churn from misses caused by
+    ///   real prefix instability.


Copilot AI review requested due to automatic review settings May 2, 2026 00:54

Copilot started reviewing on behalf of Hmbown May 2, 2026 00:54 View session

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 2, 2026

View reviewed changes

Copilot AI reviewed May 2, 2026

View reviewed changes

This was referenced May 2, 2026

working_set::summary_block churns the cache prefix every turn ( / interpolation) #280

Closed

Expand zh-Hans localization beyond the 27 UI strings — slash commands, debug output, errors #285

Closed

Hmbown merged commit e928c00 into feat/v0.8.4 May 2, 2026
6 checks passed

Hmbown deleted the feat/issue-263-cache-debug-command branch May 2, 2026 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss#278

feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss#278
Hmbown merged 1 commit into
feat/v0.8.4from
feat/issue-263-cache-debug-command

Hmbown commented May 2, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hmbown commented May 2, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Edge cases handled by the formatter

What this unlocks

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hmbown commented May 2, 2026 •

edited by devin-ai-integration Bot

Loading