Skip to content

working_set::summary_block churns the cache prefix every turn ( / interpolation) #280

@Hmbown

Description

@Hmbown

Follow-up to #263. The byte-diff harness in PR #279 ruled out suspects #4#5; this issue documents what the same harness would surface against suspect #1 (the working-set summary block).

What I found

`crates/tui/src/working_set.rs::summary_block` (line 396-422) renders the active-paths list as:

```rust
for entry in prompt_entries {
let age = self.turn.saturating_sub(entry.last_turn);
let kind = if entry.is_dir { "dir" } else { "file" };
lines.push(format!(
"- {} ({kind}, touches: {}, last seen: {} turn(s) ago)",
entry.path, entry.touches, age
));
}
```

Both `age` and `touches` change for the same path turn-over-turn:

  • `age` = `self.turn − entry.last_turn`. `self.turn` is incremented by `next_turn()` on every user message (`record_messages`). So even if the path's `last_turn` doesn't move, `age` increments by 1 every turn.
  • `touches` increments any time the path is referenced again — fine when it actually changes, but the string representation of the count changes every increment.

`summary_block` ends up interpolated into the system prompt by `prompts.rs::system_prompt_for_mode_with_context_and_skills` (`working_set_summary` argument, line 204-208) at a position before the historical conversation. DeepSeek's KV cache hits on the longest matching byte prefix of the request, so the moment `summary_block` produces different bytes, every byte after it is a cache miss — including the entire historical conversation that should have been cached as `(system + user_1 + assistant_1 + … + user_N)`.

Hypothesis

Each turn the cache misses everything from the working-set block forward. On a long session this approximates zero prefix-cache reuse, which would explain the user-reported gap vs Claude Code in #263.

Suggested fixes (pick one)

  1. Drop the volatile fields from the rendered summary. Replace the per-entry line with just `- {path} ({kind})`. Loses the "touches / last seen N turns ago" signal but immediately restores cache stability.

  2. Move the working-set block out of the cached prefix. Instead of interpolating into the system prompt, append it as a synthetic ephemeral user message at the end of the request (just before the latest user turn). The byte position is then in the volatile tail anyway.

  3. Make `age` / `touches` deterministic across same-input replays. Hard — they're inherently turn-counters. Skip.

(1) is the cheapest. (2) is the right architectural fix and matches how Anthropic's `cache_control` ephemeral markers are typically used. The maintainer should pick.

How to verify the fix

Once a fix lands, run:

```
cargo test -p deepseek-tui --bin deepseek-tui --locked prompts::
```

…and add a sibling test in `working_set.rs` that observes a fixed message set, advances `next_turn()`, and asserts `summary_block` produces identical bytes for both turns when no new paths were touched. Reuse `first_divergence` from `crates/tui/src/prompts.rs` (it's intentionally generic).

The user-facing telemetry surface for verifying real-world impact lives behind `/cache` (#278) — hit ratios should jump on long sessions once the working-set churn stops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions