Skip to content

feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss#278

Merged
Hmbown merged 1 commit into
feat/v0.8.4from
feat/issue-263-cache-debug-command
May 2, 2026
Merged

feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss#278
Hmbown merged 1 commit into
feat/v0.8.4from
feat/issue-263-cache-debug-command

Conversation

@Hmbown

@Hmbown Hmbown commented May 2, 2026

Copy link
Copy Markdown
Owner

Summary

Step 1 of #263 — without per-turn cache telemetry on screen the prefix-cache audit is unfounded speculation. This adds the foundation.

The DeepSeek API already returns `prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` per turn, and we already store the latest values on `App`. This adds:

  • `App::turn_cache_history: VecDeque` — 50-turn ring populated at the same site as `last_prompt_cache_*_tokens` (`tui/ui.rs:716`)
  • `/cache [count]` slash command (default count 10, clamps to ring size) that renders a fixed-width table the user can paste into a bug report:

```
Cache telemetry — last 4 of 4 turn(s) (model: deepseek-v4-pro)
────────────────────────────────────────────────────────────────────────────
turn in out hit miss replay ratio age
────────────────────────────────────────────────────────────────────────────
1 4000 200 3000 1000 — 75.0% 0s
2 6000 250 3000 3000 150 50.0% 0s
3 5000 100 2500 2500* — 50.0% 0s
4 1000 50 — — — — 0s
────────────────────────────────────────────────────────────────────────────
Σ in: 16000 Σ hit: 8500 Σ miss: 6500 avg hit ratio: 56.7%

  • miss inferred from input − hit when the provider did not report it explicitly.
    Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; …
    ```

Edge cases handled by the formatter

  • No telemetry yet → friendly "no turns recorded" message instead of an empty table
  • `cache_hit_tokens = None` (provider didn't report) → row renders em-dashes and is excluded from aggregates, so one missing-telemetry turn can't make the average look broken
  • `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark with `*`; footer documents the asterisk
  • Ring at cap (50) → push evicts oldest

All four paths are covered by tests; the regression test `turn_cache_history_is_capped_at_50` pins the cap.

What this unlocks

With per-turn telemetry on screen, step 2 of the audit (byte-diff harness) can be measurably driven. Step 3 (suspect-by-suspect bisection) can verify each fix with `/cache` showing the ratio jump.

Test plan

  • `cargo test -p deepseek-tui --bin deepseek-tui --locked` (1742/1744 — 2 ignored unrelated)
  • `cargo fmt --all -- --check`
  • `cargo clippy -p deepseek-tui --all-targets --locked -- -D warnings`

🤖 Generated with Claude Code


Open in Devin Review

…/miss

Step 1 of #263. Without per-turn telemetry the prefix-cache audit is
unfounded speculation; the rest of the issue's investigation steps
depend on this surface.

The DeepSeek API already returns `prompt_cache_hit_tokens` and
`prompt_cache_miss_tokens` per turn, and we already store the *latest*
on App. This adds a 50-turn ring (`turn_cache_history`) populated at
the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]`
slash command that renders a fixed-width table of the last N turns
with per-turn ratios and a session aggregate. Default count is 10;
larger values clamp to the ring size.

Edge cases the formatter handles:

- No telemetry yet → friendly "no turns recorded" message
- `cache_hit_tokens = None` (provider didn't report) → row renders all
  em-dashes and is excluded from session aggregates so one missing-
  telemetry turn can't make the average ratio look broken.
- `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as
  `input − hit` and mark the cell with `*`. Footer documents the
  asterisk.
- Ring at cap (50) → push evicts oldest.

Tests cover all four paths plus the cap.
Copilot AI review requested due to automatic review settings May 2, 2026 00:54

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /cache debug command to display per-turn DeepSeek prefix-cache telemetry. The changes include a new TurnCacheRecord struct, a capped history buffer in the application state, and logic to format the telemetry into a table. Review feedback identifies several alignment and scalability issues in the table rendering, specifically recommending increased column widths for large token counts and consistent padding for ratio strings and separators.

Comment on lines +167 to +169
header.push_str("turn in out hit miss replay ratio age\n");
header.push_str(&"─".repeat(76));
header.push('\n');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The table header and separator are misaligned with the row format string. Additionally, the column widths for token counts (in, out, hit, miss, replay) are set to 5 or 6, which is insufficient for models with large context windows (e.g., DeepSeek-V3's 128k context results in 6-digit token counts). Using a consistent format! call for the header and increasing column widths to 7 characters ensures alignment and provides better headroom.

Suggested change
header.push_str("turn in out hit miss replay ratio age\n");
header.push_str(&"─".repeat(76));
header.push('\n');
header.push_str(&format!(
"{:>4} {:>7} {:>7} {:>7} {:>7} {:>7} {:>6} {}\n",
"turn", "in", "out", "hit", "miss", "replay", "ratio", "age"
));
header.push_str(&"─".repeat(64));
header.push('\n');

Comment on lines +176 to +228
totals_input += u64::from(rec.input_tokens);

let replay_cell = rec
.reasoning_replay_tokens
.map_or_else(|| "—".to_string(), |t| t.to_string());
let age = humanize_age(now.saturating_duration_since(rec.recorded_at));

// No cache telemetry → render `—` everywhere and don't pollute totals
// with inferred zeros. Some providers (and some routes inside DeepSeek)
// skip the cache fields; including a synthesized 0/N for those turns
// would make every aggregate ratio look broken.
let Some(hit) = rec.cache_hit_tokens else {
body.push_str(&format!(
"{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio:>6} {age}\n",
turn = turn_index,
input = rec.input_tokens,
output = rec.output_tokens,
hit = "—",
miss = "—",
replay = replay_cell,
ratio = "—",
age = age,
));
continue;
};

let miss_reported = rec.cache_miss_tokens;
let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit));
let accounted = u64::from(hit) + u64::from(miss);
let ratio = if accounted == 0 {
" —".to_string()
} else {
format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
};
totals_hit += u64::from(hit);
totals_miss += u64::from(miss);

let miss_cell = match miss_reported {
Some(_) => format!("{miss}"),
None => format!("{miss}*"),
};

body.push_str(&format!(
"{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio} {age}\n",
turn = turn_index,
input = rec.input_tokens,
output = rec.output_tokens,
hit = hit,
miss = miss_cell,
replay = replay_cell,
ratio = ratio,
age = age,
));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several issues in the row rendering logic:

  1. totals_input is incremented for every turn, including those without telemetry. This makes the footer Σ in inconsistent with Σ hit + Σ miss, which is confusing when auditing cache performance. It should only sum turns that contribute to the telemetry aggregates.
  2. The ratio string for the accounted == 0 case (line 206) is 5 characters long (" —"), while the numeric ratio (line 208) and the "no telemetry" case (line 196) are 6 characters long, causing misalignment.
  3. Column widths should be increased to 7 to accommodate larger token counts.
    for (i, rec) in rows.iter().enumerate() {
        let turn_index = absolute_start + i + 1;

        let replay_cell = rec
            .reasoning_replay_tokens
            .map_or_else(|| "—".to_string(), |t| t.to_string());
        let age = humanize_age(now.saturating_duration_since(rec.recorded_at));

        // No cache telemetry → render `—` everywhere and don't pollute totals
        // with inferred zeros. Some providers (and some routes inside DeepSeek)
        // skip the cache fields; including a synthesized 0/N for those turns
        // would make every aggregate ratio look broken.
        let Some(hit) = rec.cache_hit_tokens else {
            body.push_str(&format!(
                "{turn:>4}  {input:>7}  {output:>7}  {hit:>7}  {miss:>7}  {replay:>7}   {ratio:>6}   {age}\n",
                turn = turn_index,
                input = rec.input_tokens,
                output = rec.output_tokens,
                hit = "—",
                miss = "—",
                replay = replay_cell,
                ratio = "—",
                age = age,
            ));
            continue;
        };

        totals_input += u64::from(rec.input_tokens);
        let miss_reported = rec.cache_miss_tokens;
        let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit));
        let accounted = u64::from(hit) + u64::from(miss);
        let ratio = if accounted == 0 {
            "     —".to_string()
        } else {
            format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
        };
        totals_hit += u64::from(hit);
        totals_miss += u64::from(miss);

        let miss_cell = match miss_reported {
            Some(_) => format!("{miss}"),
            None => format!("{miss}*"),
        };

        body.push_str(&format!(
            "{turn:>4}  {input:>7}  {output:>7}  {hit:>7}  {miss:>7}  {replay:>7}   {ratio}   {age}\n",
            turn = turn_index,
            input = rec.input_tokens,
            output = rec.output_tokens,
            hit = hit,
            miss = miss_cell,
            replay = replay_cell,
            ratio = ratio,
            age = age,
        ));
    }

Comment on lines +242 to +243
footer.push_str(&"─".repeat(76));
footer.push('\n');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The separator length should be updated to match the new table width.

Suggested change
footer.push_str(&"─".repeat(76));
footer.push('\n');
let mut footer = String::new();
footer.push_str(&"─".repeat(64));

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new debug surface to make DeepSeek per-turn prefix-cache telemetry visible in the TUI, enabling measurable cache-hit auditing for issue #263.

Changes:

  • Record per-turn cache telemetry into a capped (50) VecDeque on App.
  • Add /cache [count] slash command to render recent turns as a copy/paste-friendly table with aggregates.
  • Register the new command in the command registry and add unit tests for edge cases and capping.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
crates/tui/src/tui/ui.rs Appends a TurnCacheRecord at turn finalization time using usage telemetry.
crates/tui/src/tui/app.rs Introduces TurnCacheRecord, turn_cache_history, and a capped push helper.
crates/tui/src/commands/mod.rs Registers the /cache command in the command list and dispatcher.
crates/tui/src/commands/debug.rs Implements /cache rendering/formatting and adds tests for edge cases and capping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +130 to +138
pub fn cache(app: &mut App, arg: Option<&str>) -> CommandResult {
let want = arg
.and_then(|s| s.trim().parse::<usize>().ok())
.unwrap_or(10);
let cap = app.turn_cache_history.len();
let count = want
.min(cap)
.min(crate::tui::app::App::TURN_CACHE_HISTORY_CAP);

Comment on lines +160 to +163
"Cache telemetry — last {} of {} turn(s) (model: {})\n",
rows.len(),
total,
app.model
Comment on lines +218 to +220
body.push_str(&format!(
"{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio} {age}\n",
turn = turn_index,
Comment on lines +205 to +209
let ratio = if accounted == 0 {
" —".to_string()
} else {
format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
};
footer.push_str(&"─".repeat(76));
footer.push('\n');
footer.push_str(&format!(
"Σ in: {totals_input} Σ hit: {totals_hit} Σ miss: {totals_miss} avg hit ratio: {avg_ratio}\n",
"* miss inferred from input − hit when the provider did not report it explicitly.\n",
);
footer.push_str(
"Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; \n\
Comment thread crates/tui/src/tui/app.rs
Comment on lines +80 to +82
/// V4-thinking tool-calling turns (chars/3 heuristic). Helps separate
/// cache misses caused by reasoning-replay churn from misses caused by
/// real prefix instability.
@Hmbown Hmbown merged commit e928c00 into feat/v0.8.4 May 2, 2026
6 checks passed
@Hmbown Hmbown deleted the feat/issue-263-cache-debug-command branch May 2, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants