Skip to content

[Feature] Show cache hit rate in the context tab #884

@Astro-Han

Description

@Astro-Han

What task are you trying to do?

When I work with PawWork on a long session, I want to know whether each turn is reusing the model's cache or paying full price. Right now the context tab shows raw cache token numbers ("12.3K / 4.5K") but I can't tell at a glance whether that's healthy (90%+ hit rate, normal working state) or broken (drops to single digits after a compact, model switch, or system prompt change).

This matters because prompt cache reuse is the single biggest factor in real session cost: a healthy session pays ~10% of the nominal input cost; a session that lost its cache prefix pays full price. Today this difference is invisible to the user until the bill arrives.

Which area would this change affect?

UI or design system

(The data is already collected end-to-end — this is a presentation-layer change. Underlying area is "Model harness, prompts, tools, or session mechanics" for the derived-metric calculation.)

What do you do today?

Open the context tab, look at the cacheTokens row showing cacheRead / cacheWrite as two raw numbers, and try to mentally divide them against inputTokens to figure out the hit rate. There's no way to spot the turn where the cache broke — only the cumulative state of the last assistant message.

Cross-tool reference: Codex CLI doesn't surface this either (open issue openai/codex#18815 asks for it in the status line). Claude Code's /cost shows totals but not per-turn hit rate. So this is a real gap across the category.

What would a good result look like?

The context tab shows a single, readable Cache Hit Rate percentage for the latest turn, with healthy/warning/critical color coding. Hovering reveals the raw read/write numbers (current behavior, preserved for power users).

Ideally, each assistant message in the conversation also gets a small cache status indicator, so the user can see which turn broke the cache prefix (e.g. the turn right after switching models, after /compact, or after a system prompt change).

What would count as done?

P0 (must have):

  • The context tab's cacheTokens row is replaced with (or augmented by) a Cache Hit Rate row showing a percentage.
  • Formula uses the standard ACP-compatible definition: cache.read / (input + cache.read + cache.write) from the latest assistant message's tokens.
  • Hovering the percentage reveals the raw read / write numbers so the existing information is not lost.
  • When the value is null or zero (e.g. provider doesn't return cache data, or first turn of a new session), the row degrades gracefully — no "0%" panic state.

P1 (nice to have, can be a follow-up issue):

  • Per-message cache indicator: a small dot/badge next to each assistant message in the conversation, color-coded by that turn's hit rate (green ≥90%, yellow 50–90%, red <50%).
  • A simple sparkline of hit rate by turn in the context tab, so the user can visually locate the turn where the cache broke.

What should stay out of scope?

  • No need to predict cache eligibility before sending a turn.
  • No need to surface this in the main chat view (P0 lives only in the context tab).
  • No need to optimize cache behavior — this is observability only.

Which audience does this matter to most?

Both

Extra context

Where the data already exists in the codebase:

  • Collection: packages/opencode/src/acp/agent.ts:1473cachedReadTokens already flows through the ACP protocol layer.
  • Aggregation: packages/app/src/components/session/session-context-metrics.ts:100-101cacheRead and cacheWrite are already on the Context type.
  • Display: packages/app/src/components/session/session-context-tab.tsx:220-223 — current cacheTokens row renders the raw read/write pair.

The change is small in scope:

  1. Add a cacheHitRate: number | null field to the Context type in session-context-metrics.ts, computed in build().
  2. Add a new context.stats.cacheHitRate row in the stats array in session-context-tab.tsx (and the i18n key).
  3. Decide whether to replace or augment the existing cacheTokens row.

Why this is a differentiator, not a parity feature:

Per the analysis in wiki/articles/ai-coding-tools/prompt-caching-mechanism.md, prompt caching turns a 6.00x nominal cost into a 0.945x effective cost when working normally. Compact, model switches, and system prompt changes are hard breakpoints that drop the rate to zero. Codex's open issue openai/codex#18815 is the same request for their TUI; PawWork can ship it first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions