What task are you trying to do?
When I work with PawWork on a long session, I want to know whether each turn is reusing the model's cache or paying full price. Right now the context tab shows raw cache token numbers ("12.3K / 4.5K") but I can't tell at a glance whether that's healthy (90%+ hit rate, normal working state) or broken (drops to single digits after a compact, model switch, or system prompt change).
This matters because prompt cache reuse is the single biggest factor in real session cost: a healthy session pays ~10% of the nominal input cost; a session that lost its cache prefix pays full price. Today this difference is invisible to the user until the bill arrives.
Which area would this change affect?
UI or design system
(The data is already collected end-to-end — this is a presentation-layer change. Underlying area is "Model harness, prompts, tools, or session mechanics" for the derived-metric calculation.)
What do you do today?
Open the context tab, look at the cacheTokens row showing cacheRead / cacheWrite as two raw numbers, and try to mentally divide them against inputTokens to figure out the hit rate. There's no way to spot the turn where the cache broke — only the cumulative state of the last assistant message.
Cross-tool reference: Codex CLI doesn't surface this either (open issue openai/codex#18815 asks for it in the status line). Claude Code's /cost shows totals but not per-turn hit rate. So this is a real gap across the category.
What would a good result look like?
The context tab shows a single, readable Cache Hit Rate percentage for the latest turn, with healthy/warning/critical color coding. Hovering reveals the raw read/write numbers (current behavior, preserved for power users).
Ideally, each assistant message in the conversation also gets a small cache status indicator, so the user can see which turn broke the cache prefix (e.g. the turn right after switching models, after /compact, or after a system prompt change).
What would count as done?
P0 (must have):
- The context tab's
cacheTokens row is replaced with (or augmented by) a Cache Hit Rate row showing a percentage.
- Formula uses the standard ACP-compatible definition:
cache.read / (input + cache.read + cache.write) from the latest assistant message's tokens.
- Hovering the percentage reveals the raw
read / write numbers so the existing information is not lost.
- When the value is
null or zero (e.g. provider doesn't return cache data, or first turn of a new session), the row degrades gracefully — no "0%" panic state.
P1 (nice to have, can be a follow-up issue):
- Per-message cache indicator: a small dot/badge next to each assistant message in the conversation, color-coded by that turn's hit rate (green ≥90%, yellow 50–90%, red <50%).
- A simple sparkline of hit rate by turn in the context tab, so the user can visually locate the turn where the cache broke.
What should stay out of scope?
- No need to predict cache eligibility before sending a turn.
- No need to surface this in the main chat view (P0 lives only in the context tab).
- No need to optimize cache behavior — this is observability only.
Which audience does this matter to most?
Both
Extra context
Where the data already exists in the codebase:
- Collection:
packages/opencode/src/acp/agent.ts:1473 — cachedReadTokens already flows through the ACP protocol layer.
- Aggregation:
packages/app/src/components/session/session-context-metrics.ts:100-101 — cacheRead and cacheWrite are already on the Context type.
- Display:
packages/app/src/components/session/session-context-tab.tsx:220-223 — current cacheTokens row renders the raw read/write pair.
The change is small in scope:
- Add a
cacheHitRate: number | null field to the Context type in session-context-metrics.ts, computed in build().
- Add a new
context.stats.cacheHitRate row in the stats array in session-context-tab.tsx (and the i18n key).
- Decide whether to replace or augment the existing
cacheTokens row.
Why this is a differentiator, not a parity feature:
Per the analysis in wiki/articles/ai-coding-tools/prompt-caching-mechanism.md, prompt caching turns a 6.00x nominal cost into a 0.945x effective cost when working normally. Compact, model switches, and system prompt changes are hard breakpoints that drop the rate to zero. Codex's open issue openai/codex#18815 is the same request for their TUI; PawWork can ship it first.
What task are you trying to do?
When I work with PawWork on a long session, I want to know whether each turn is reusing the model's cache or paying full price. Right now the context tab shows raw cache token numbers ("12.3K / 4.5K") but I can't tell at a glance whether that's healthy (90%+ hit rate, normal working state) or broken (drops to single digits after a compact, model switch, or system prompt change).
This matters because prompt cache reuse is the single biggest factor in real session cost: a healthy session pays ~10% of the nominal input cost; a session that lost its cache prefix pays full price. Today this difference is invisible to the user until the bill arrives.
Which area would this change affect?
UI or design system
(The data is already collected end-to-end — this is a presentation-layer change. Underlying area is "Model harness, prompts, tools, or session mechanics" for the derived-metric calculation.)
What do you do today?
Open the context tab, look at the
cacheTokensrow showingcacheRead / cacheWriteas two raw numbers, and try to mentally divide them againstinputTokensto figure out the hit rate. There's no way to spot the turn where the cache broke — only the cumulative state of the last assistant message.Cross-tool reference: Codex CLI doesn't surface this either (open issue openai/codex#18815 asks for it in the status line). Claude Code's
/costshows totals but not per-turn hit rate. So this is a real gap across the category.What would a good result look like?
The context tab shows a single, readable Cache Hit Rate percentage for the latest turn, with healthy/warning/critical color coding. Hovering reveals the raw read/write numbers (current behavior, preserved for power users).
Ideally, each assistant message in the conversation also gets a small cache status indicator, so the user can see which turn broke the cache prefix (e.g. the turn right after switching models, after
/compact, or after a system prompt change).What would count as done?
P0 (must have):
cacheTokensrow is replaced with (or augmented by) aCache Hit Raterow showing a percentage.cache.read / (input + cache.read + cache.write)from the latest assistant message's tokens.read / writenumbers so the existing information is not lost.nullor zero (e.g. provider doesn't return cache data, or first turn of a new session), the row degrades gracefully — no "0%" panic state.P1 (nice to have, can be a follow-up issue):
What should stay out of scope?
Which audience does this matter to most?
Both
Extra context
Where the data already exists in the codebase:
packages/opencode/src/acp/agent.ts:1473—cachedReadTokensalready flows through the ACP protocol layer.packages/app/src/components/session/session-context-metrics.ts:100-101—cacheReadandcacheWriteare already on theContexttype.packages/app/src/components/session/session-context-tab.tsx:220-223— currentcacheTokensrow renders the raw read/write pair.The change is small in scope:
cacheHitRate: number | nullfield to theContexttype insession-context-metrics.ts, computed inbuild().context.stats.cacheHitRaterow in thestatsarray insession-context-tab.tsx(and the i18n key).cacheTokensrow.Why this is a differentiator, not a parity feature:
Per the analysis in wiki/articles/ai-coding-tools/prompt-caching-mechanism.md, prompt caching turns a 6.00x nominal cost into a 0.945x effective cost when working normally. Compact, model switches, and system prompt changes are hard breakpoints that drop the rate to zero. Codex's open issue openai/codex#18815 is the same request for their TUI; PawWork can ship it first.