[Feature] Show cache hit rate in the context tab

### What task are you trying to do?

When I work with PawWork on a long session, I want to know whether each turn is reusing the model's cache or paying full price. Right now the context tab shows raw cache token numbers ("12.3K / 4.5K") but I can't tell at a glance whether that's healthy (90%+ hit rate, normal working state) or broken (drops to single digits after a compact, model switch, or system prompt change).

This matters because prompt cache reuse is the single biggest factor in real session cost: a healthy session pays ~10% of the nominal input cost; a session that lost its cache prefix pays full price. Today this difference is invisible to the user until the bill arrives.

### Which area would this change affect?

UI or design system

(The data is already collected end-to-end — this is a presentation-layer change. Underlying area is "Model harness, prompts, tools, or session mechanics" for the derived-metric calculation.)

### What do you do today?

Open the context tab, look at the `cacheTokens` row showing `cacheRead / cacheWrite` as two raw numbers, and try to mentally divide them against `inputTokens` to figure out the hit rate. There's no way to spot the turn where the cache broke — only the cumulative state of the last assistant message.

Cross-tool reference: Codex CLI doesn't surface this either (open issue [openai/codex#18815](https://github.com/openai/codex/issues/18815) asks for it in the status line). Claude Code's `/cost` shows totals but not per-turn hit rate. So this is a real gap across the category.

### What would a good result look like?

The context tab shows a single, readable **Cache Hit Rate** percentage for the latest turn, with healthy/warning/critical color coding. Hovering reveals the raw read/write numbers (current behavior, preserved for power users).

Ideally, each assistant message in the conversation also gets a small cache status indicator, so the user can see which turn broke the cache prefix (e.g. the turn right after switching models, after `/compact`, or after a system prompt change).

### What would count as done?

**P0 (must have):**
- The context tab's `cacheTokens` row is replaced with (or augmented by) a `Cache Hit Rate` row showing a percentage.
- Formula uses the standard ACP-compatible definition: `cache.read / (input + cache.read + cache.write)` from the latest assistant message's tokens.
- Hovering the percentage reveals the raw `read / write` numbers so the existing information is not lost.
- When the value is `null` or zero (e.g. provider doesn't return cache data, or first turn of a new session), the row degrades gracefully — no "0%" panic state.

**P1 (nice to have, can be a follow-up issue):**
- Per-message cache indicator: a small dot/badge next to each assistant message in the conversation, color-coded by that turn's hit rate (green ≥90%, yellow 50–90%, red <50%).
- A simple sparkline of hit rate by turn in the context tab, so the user can visually locate the turn where the cache broke.

### What should stay out of scope?

- No need to predict cache eligibility before sending a turn.
- No need to surface this in the main chat view (P0 lives only in the context tab).
- No need to optimize cache behavior — this is observability only.

### Which audience does this matter to most?

Both

### Extra context

**Where the data already exists in the codebase:**

- Collection: `packages/opencode/src/acp/agent.ts:1473` — `cachedReadTokens` already flows through the ACP protocol layer.
- Aggregation: `packages/app/src/components/session/session-context-metrics.ts:100-101` — `cacheRead` and `cacheWrite` are already on the `Context` type.
- Display: `packages/app/src/components/session/session-context-tab.tsx:220-223` — current `cacheTokens` row renders the raw read/write pair.

The change is small in scope:

1. Add a `cacheHitRate: number | null` field to the `Context` type in `session-context-metrics.ts`, computed in `build()`.
2. Add a new `context.stats.cacheHitRate` row in the `stats` array in `session-context-tab.tsx` (and the i18n key).
3. Decide whether to replace or augment the existing `cacheTokens` row.

**Why this is a differentiator, not a parity feature:**

Per the analysis in [wiki/articles/ai-coding-tools/prompt-caching-mechanism.md](https://github.com/Astro-Han/wiki/blob/main/articles/ai-coding-tools/prompt-caching-mechanism.md), prompt caching turns a 6.00x nominal cost into a 0.945x effective cost when working normally. Compact, model switches, and system prompt changes are hard breakpoints that drop the rate to zero. Codex's open issue [openai/codex#18815](https://github.com/openai/codex/issues/18815) is the same request for their TUI; PawWork can ship it first.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Show cache hit rate in the context tab #884

What task are you trying to do?

Which area would this change affect?

What do you do today?

What would a good result look like?

What would count as done?

What should stay out of scope?

Which audience does this matter to most?

Extra context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Show cache hit rate in the context tab #884

Description

What task are you trying to do?

Which area would this change affect?

What do you do today?

What would a good result look like?

What would count as done?

What should stay out of scope?

Which audience does this matter to most?

Extra context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions