Follow-up to #263. The byte-diff harness in PR #279 ruled out suspects #4–#5; this issue documents what the same harness would surface against suspect #1 (the working-set summary block).
What I found
`crates/tui/src/working_set.rs::summary_block` (line 396-422) renders the active-paths list as:
```rust
for entry in prompt_entries {
let age = self.turn.saturating_sub(entry.last_turn);
let kind = if entry.is_dir { "dir" } else { "file" };
lines.push(format!(
"- {} ({kind}, touches: {}, last seen: {} turn(s) ago)",
entry.path, entry.touches, age
));
}
```
Both `age` and `touches` change for the same path turn-over-turn:
- `age` = `self.turn − entry.last_turn`. `self.turn` is incremented by `next_turn()` on every user message (`record_messages`). So even if the path's `last_turn` doesn't move, `age` increments by 1 every turn.
- `touches` increments any time the path is referenced again — fine when it actually changes, but the string representation of the count changes every increment.
`summary_block` ends up interpolated into the system prompt by `prompts.rs::system_prompt_for_mode_with_context_and_skills` (`working_set_summary` argument, line 204-208) at a position before the historical conversation. DeepSeek's KV cache hits on the longest matching byte prefix of the request, so the moment `summary_block` produces different bytes, every byte after it is a cache miss — including the entire historical conversation that should have been cached as `(system + user_1 + assistant_1 + … + user_N)`.
Hypothesis
Each turn the cache misses everything from the working-set block forward. On a long session this approximates zero prefix-cache reuse, which would explain the user-reported gap vs Claude Code in #263.
Suggested fixes (pick one)
-
Drop the volatile fields from the rendered summary. Replace the per-entry line with just `- {path} ({kind})`. Loses the "touches / last seen N turns ago" signal but immediately restores cache stability.
-
Move the working-set block out of the cached prefix. Instead of interpolating into the system prompt, append it as a synthetic ephemeral user message at the end of the request (just before the latest user turn). The byte position is then in the volatile tail anyway.
-
Make `age` / `touches` deterministic across same-input replays. Hard — they're inherently turn-counters. Skip.
(1) is the cheapest. (2) is the right architectural fix and matches how Anthropic's `cache_control` ephemeral markers are typically used. The maintainer should pick.
How to verify the fix
Once a fix lands, run:
```
cargo test -p deepseek-tui --bin deepseek-tui --locked prompts::
```
…and add a sibling test in `working_set.rs` that observes a fixed message set, advances `next_turn()`, and asserts `summary_block` produces identical bytes for both turns when no new paths were touched. Reuse `first_divergence` from `crates/tui/src/prompts.rs` (it's intentionally generic).
The user-facing telemetry surface for verifying real-world impact lives behind `/cache` (#278) — hit ratios should jump on long sessions once the working-set churn stops.
Follow-up to #263. The byte-diff harness in PR #279 ruled out suspects #4–#5; this issue documents what the same harness would surface against suspect #1 (the working-set summary block).
What I found
`crates/tui/src/working_set.rs::summary_block` (line 396-422) renders the active-paths list as:
```rust
for entry in prompt_entries {
let age = self.turn.saturating_sub(entry.last_turn);
let kind = if entry.is_dir { "dir" } else { "file" };
lines.push(format!(
"- {} ({kind}, touches: {}, last seen: {} turn(s) ago)",
entry.path, entry.touches, age
));
}
```
Both `age` and `touches` change for the same path turn-over-turn:
`summary_block` ends up interpolated into the system prompt by `prompts.rs::system_prompt_for_mode_with_context_and_skills` (`working_set_summary` argument, line 204-208) at a position before the historical conversation. DeepSeek's KV cache hits on the longest matching byte prefix of the request, so the moment `summary_block` produces different bytes, every byte after it is a cache miss — including the entire historical conversation that should have been cached as `(system + user_1 + assistant_1 + … + user_N)`.
Hypothesis
Each turn the cache misses everything from the working-set block forward. On a long session this approximates zero prefix-cache reuse, which would explain the user-reported gap vs Claude Code in #263.
Suggested fixes (pick one)
Drop the volatile fields from the rendered summary. Replace the per-entry line with just `- {path} ({kind})`. Loses the "touches / last seen N turns ago" signal but immediately restores cache stability.
Move the working-set block out of the cached prefix. Instead of interpolating into the system prompt, append it as a synthetic ephemeral user message at the end of the request (just before the latest user turn). The byte position is then in the volatile tail anyway.
Make `age` / `touches` deterministic across same-input replays. Hard — they're inherently turn-counters. Skip.
(1) is the cheapest. (2) is the right architectural fix and matches how Anthropic's `cache_control` ephemeral markers are typically used. The maintainer should pick.
How to verify the fix
Once a fix lands, run:
```
cargo test -p deepseek-tui --bin deepseek-tui --locked prompts::
```
…and add a sibling test in `working_set.rs` that observes a fixed message set, advances `next_turn()`, and asserts `summary_block` produces identical bytes for both turns when no new paths were touched. Reuse `first_divergence` from `crates/tui/src/prompts.rs` (it's intentionally generic).
The user-facing telemetry surface for verifying real-world impact lives behind `/cache` (#278) — hit ratios should jump on long sessions once the working-set churn stops.