feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss#278
Conversation
…/miss Step 1 of #263. Without per-turn telemetry the prefix-cache audit is unfounded speculation; the rest of the issue's investigation steps depend on this surface. The DeepSeek API already returns `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` per turn, and we already store the *latest* on App. This adds a 50-turn ring (`turn_cache_history`) populated at the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]` slash command that renders a fixed-width table of the last N turns with per-turn ratios and a session aggregate. Default count is 10; larger values clamp to the ring size. Edge cases the formatter handles: - No telemetry yet → friendly "no turns recorded" message - `cache_hit_tokens = None` (provider didn't report) → row renders all em-dashes and is excluded from session aggregates so one missing- telemetry turn can't make the average ratio look broken. - `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark the cell with `*`. Footer documents the asterisk. - Ring at cap (50) → push evicts oldest. Tests cover all four paths plus the cap.
There was a problem hiding this comment.
Code Review
This pull request introduces a new /cache debug command to display per-turn DeepSeek prefix-cache telemetry. The changes include a new TurnCacheRecord struct, a capped history buffer in the application state, and logic to format the telemetry into a table. Review feedback identifies several alignment and scalability issues in the table rendering, specifically recommending increased column widths for large token counts and consistent padding for ratio strings and separators.
| header.push_str("turn in out hit miss replay ratio age\n"); | ||
| header.push_str(&"─".repeat(76)); | ||
| header.push('\n'); |
There was a problem hiding this comment.
The table header and separator are misaligned with the row format string. Additionally, the column widths for token counts (in, out, hit, miss, replay) are set to 5 or 6, which is insufficient for models with large context windows (e.g., DeepSeek-V3's 128k context results in 6-digit token counts). Using a consistent format! call for the header and increasing column widths to 7 characters ensures alignment and provides better headroom.
| header.push_str("turn in out hit miss replay ratio age\n"); | |
| header.push_str(&"─".repeat(76)); | |
| header.push('\n'); | |
| header.push_str(&format!( | |
| "{:>4} {:>7} {:>7} {:>7} {:>7} {:>7} {:>6} {}\n", | |
| "turn", "in", "out", "hit", "miss", "replay", "ratio", "age" | |
| )); | |
| header.push_str(&"─".repeat(64)); | |
| header.push('\n'); |
| totals_input += u64::from(rec.input_tokens); | ||
|
|
||
| let replay_cell = rec | ||
| .reasoning_replay_tokens | ||
| .map_or_else(|| "—".to_string(), |t| t.to_string()); | ||
| let age = humanize_age(now.saturating_duration_since(rec.recorded_at)); | ||
|
|
||
| // No cache telemetry → render `—` everywhere and don't pollute totals | ||
| // with inferred zeros. Some providers (and some routes inside DeepSeek) | ||
| // skip the cache fields; including a synthesized 0/N for those turns | ||
| // would make every aggregate ratio look broken. | ||
| let Some(hit) = rec.cache_hit_tokens else { | ||
| body.push_str(&format!( | ||
| "{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio:>6} {age}\n", | ||
| turn = turn_index, | ||
| input = rec.input_tokens, | ||
| output = rec.output_tokens, | ||
| hit = "—", | ||
| miss = "—", | ||
| replay = replay_cell, | ||
| ratio = "—", | ||
| age = age, | ||
| )); | ||
| continue; | ||
| }; | ||
|
|
||
| let miss_reported = rec.cache_miss_tokens; | ||
| let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit)); | ||
| let accounted = u64::from(hit) + u64::from(miss); | ||
| let ratio = if accounted == 0 { | ||
| " —".to_string() | ||
| } else { | ||
| format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64) | ||
| }; | ||
| totals_hit += u64::from(hit); | ||
| totals_miss += u64::from(miss); | ||
|
|
||
| let miss_cell = match miss_reported { | ||
| Some(_) => format!("{miss}"), | ||
| None => format!("{miss}*"), | ||
| }; | ||
|
|
||
| body.push_str(&format!( | ||
| "{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio} {age}\n", | ||
| turn = turn_index, | ||
| input = rec.input_tokens, | ||
| output = rec.output_tokens, | ||
| hit = hit, | ||
| miss = miss_cell, | ||
| replay = replay_cell, | ||
| ratio = ratio, | ||
| age = age, | ||
| )); |
There was a problem hiding this comment.
There are several issues in the row rendering logic:
totals_inputis incremented for every turn, including those without telemetry. This makes the footerΣ ininconsistent withΣ hit + Σ miss, which is confusing when auditing cache performance. It should only sum turns that contribute to the telemetry aggregates.- The
ratiostring for theaccounted == 0case (line 206) is 5 characters long (" —"), while the numeric ratio (line 208) and the "no telemetry" case (line 196) are 6 characters long, causing misalignment. - Column widths should be increased to 7 to accommodate larger token counts.
for (i, rec) in rows.iter().enumerate() {
let turn_index = absolute_start + i + 1;
let replay_cell = rec
.reasoning_replay_tokens
.map_or_else(|| "—".to_string(), |t| t.to_string());
let age = humanize_age(now.saturating_duration_since(rec.recorded_at));
// No cache telemetry → render `—` everywhere and don't pollute totals
// with inferred zeros. Some providers (and some routes inside DeepSeek)
// skip the cache fields; including a synthesized 0/N for those turns
// would make every aggregate ratio look broken.
let Some(hit) = rec.cache_hit_tokens else {
body.push_str(&format!(
"{turn:>4} {input:>7} {output:>7} {hit:>7} {miss:>7} {replay:>7} {ratio:>6} {age}\n",
turn = turn_index,
input = rec.input_tokens,
output = rec.output_tokens,
hit = "—",
miss = "—",
replay = replay_cell,
ratio = "—",
age = age,
));
continue;
};
totals_input += u64::from(rec.input_tokens);
let miss_reported = rec.cache_miss_tokens;
let miss = miss_reported.unwrap_or_else(|| rec.input_tokens.saturating_sub(hit));
let accounted = u64::from(hit) + u64::from(miss);
let ratio = if accounted == 0 {
" —".to_string()
} else {
format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64)
};
totals_hit += u64::from(hit);
totals_miss += u64::from(miss);
let miss_cell = match miss_reported {
Some(_) => format!("{miss}"),
None => format!("{miss}*"),
};
body.push_str(&format!(
"{turn:>4} {input:>7} {output:>7} {hit:>7} {miss:>7} {replay:>7} {ratio} {age}\n",
turn = turn_index,
input = rec.input_tokens,
output = rec.output_tokens,
hit = hit,
miss = miss_cell,
replay = replay_cell,
ratio = ratio,
age = age,
));
}| footer.push_str(&"─".repeat(76)); | ||
| footer.push('\n'); |
There was a problem hiding this comment.
Pull request overview
Adds a new debug surface to make DeepSeek per-turn prefix-cache telemetry visible in the TUI, enabling measurable cache-hit auditing for issue #263.
Changes:
- Record per-turn cache telemetry into a capped (50)
VecDequeonApp. - Add
/cache [count]slash command to render recent turns as a copy/paste-friendly table with aggregates. - Register the new command in the command registry and add unit tests for edge cases and capping.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
crates/tui/src/tui/ui.rs |
Appends a TurnCacheRecord at turn finalization time using usage telemetry. |
crates/tui/src/tui/app.rs |
Introduces TurnCacheRecord, turn_cache_history, and a capped push helper. |
crates/tui/src/commands/mod.rs |
Registers the /cache command in the command list and dispatcher. |
crates/tui/src/commands/debug.rs |
Implements /cache rendering/formatting and adds tests for edge cases and capping. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn cache(app: &mut App, arg: Option<&str>) -> CommandResult { | ||
| let want = arg | ||
| .and_then(|s| s.trim().parse::<usize>().ok()) | ||
| .unwrap_or(10); | ||
| let cap = app.turn_cache_history.len(); | ||
| let count = want | ||
| .min(cap) | ||
| .min(crate::tui::app::App::TURN_CACHE_HISTORY_CAP); | ||
|
|
| "Cache telemetry — last {} of {} turn(s) (model: {})\n", | ||
| rows.len(), | ||
| total, | ||
| app.model |
| body.push_str(&format!( | ||
| "{turn:>4} {input:>5} {output:>5} {hit:>5} {miss:>5} {replay:>6} {ratio} {age}\n", | ||
| turn = turn_index, |
| let ratio = if accounted == 0 { | ||
| " —".to_string() | ||
| } else { | ||
| format!("{:>5.1}%", 100.0 * f64::from(hit) / accounted as f64) | ||
| }; |
| footer.push_str(&"─".repeat(76)); | ||
| footer.push('\n'); | ||
| footer.push_str(&format!( | ||
| "Σ in: {totals_input} Σ hit: {totals_hit} Σ miss: {totals_miss} avg hit ratio: {avg_ratio}\n", |
| "* miss inferred from input − hit when the provider did not report it explicitly.\n", | ||
| ); | ||
| footer.push_str( | ||
| "Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; \n\ |
| /// V4-thinking tool-calling turns (chars/3 heuristic). Helps separate | ||
| /// cache misses caused by reasoning-replay churn from misses caused by | ||
| /// real prefix instability. |
Summary
Step 1 of #263 — without per-turn cache telemetry on screen the prefix-cache audit is unfounded speculation. This adds the foundation.
The DeepSeek API already returns `prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` per turn, and we already store the latest values on `App`. This adds:
```
Cache telemetry — last 4 of 4 turn(s) (model: deepseek-v4-pro)
────────────────────────────────────────────────────────────────────────────
turn in out hit miss replay ratio age
────────────────────────────────────────────────────────────────────────────
1 4000 200 3000 1000 — 75.0% 0s
2 6000 250 3000 3000 150 50.0% 0s
3 5000 100 2500 2500* — 50.0% 0s
4 1000 50 — — — — 0s
────────────────────────────────────────────────────────────────────────────
Σ in: 16000 Σ hit: 8500 Σ miss: 6500 avg hit ratio: 56.7%
Hit/miss ratios over ~70% after the third turn indicate a stable cache prefix; …
```
Edge cases handled by the formatter
All four paths are covered by tests; the regression test `turn_cache_history_is_capped_at_50` pins the cap.
What this unlocks
With per-turn telemetry on screen, step 2 of the audit (byte-diff harness) can be measurably driven. Step 3 (suspect-by-suspect bisection) can verify each fix with `/cache` showing the ratio jump.
Test plan
🤖 Generated with Claude Code