fix(ui): context % uses full token footprint instead of uncached input only#49917
fix(ui): context % uses full token footprint instead of uncached input only#49917jakepresent wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes two related UI bugs where context usage percentages were being computed using only uncached input tokens (
Both changes are internally consistent and correctly align the UI with the Confidence Score: 5/5
Last reviewed commit: "fix(ui): /usage comm..." |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53ee7ac534
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const contextTotal = input + cacheRead + cacheWrite; | ||
| const contextPercent = | ||
| contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null; | ||
| contextWindow && contextTotal > 0 | ||
| ? Math.min(Math.round((contextTotal / contextWindow) * 100), 100) |
There was a problem hiding this comment.
Avoid double-counting cached prompt tokens in group ctx %
For providers that return OpenAI-style prompt_tokens together with a cached subset (cached_tokens / prompt_tokens_details.cached_tokens), usage.input already includes the cached prompt portion. The normalization tests in src/agents/usage.test.ts:57-88 preserve that shape, so adding cacheRead and cacheWrite again here turns a 1,113-token prompt into 2,137 tokens. In chats backed by Moonshot/Kimi/K2-style responses, the footer will therefore overstate % ctx and can show 100% ctx even when the actual prompt is well below the model window.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. For Anthropic-style providers, input is only the uncached portion, so input + cacheRead + cacheWrite is the correct total prompt size (this matches derivePromptTokens in src/agents/usage.ts:138-151, which the backend already uses for /status).
For Moonshot/Kimi-style providers where prompt_tokens already includes cached tokens, adding cacheRead would double-count - but that's a pre-existing issue in derivePromptTokens itself, not introduced by this PR. The UI was previously showing 0% for all cached sessions; now it matches what /status reports (which also uses derivePromptTokens).
The normalization layer not consistently representing input as uncached-only is arguably the deeper issue, but that's a larger change across providers and outside the scope of this fix.
| // totalTokens is a prompt/context snapshot (input + cacheRead + cacheWrite), | ||
| // which represents actual context occupancy. Fall back to inputTokens when | ||
| // totalTokens is unavailable (stale or missing). | ||
| const used = session?.totalTokens ?? session?.inputTokens ?? 0; |
There was a problem hiding this comment.
Keep context warning based on non-duplicated prompt tokens
session.totalTokens is derived as input + cacheRead + cacheWrite in src/agents/usage.ts:153-190, while the usage normalization tests in src/agents/usage.test.ts:57-88 show that some providers already report input as full prompt_tokens and expose the cached subset separately. Switching this banner from inputTokens to totalTokens therefore makes cached conversations from Moonshot/Kimi/K2-style providers look much fuller than they are, which can trigger the 85%/90% warning and compaction UX too early.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Same reasoning as above - this change aligns the UI with what the backend /status already reports via deriveSessionTotalTokens. The potential double-counting for Moonshot/Kimi-style providers exists in derivePromptTokens already; this PR makes the UI consistent with the backend rather than introducing a new discrepancy.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8f842a4e2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const contextTotal = input + cacheRead + cacheWrite; | ||
| const contextPercent = | ||
| contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null; | ||
| contextWindow && contextTotal > 0 | ||
| ? Math.min(Math.round((contextTotal / contextWindow) * 100), 100) |
There was a problem hiding this comment.
Use the latest prompt snapshot for grouped context %
When a chat turn spans multiple assistant messages, this group footer now adds every message's input + cacheRead + cacheWrite together. That overstates context in the common case where tool results are hidden: buildChatItems drops tool messages before grouping (ui/src/ui/views/chat.ts:1417-1418), then groupMessages merges the surrounding assistant messages (ui/src/ui/views/chat.ts:1355-1373). The cache fields are explicitly treated as non-additive snapshots elsewhere (src/agents/pi-embedded-subscribe.ts:201-213), so a tool-heavy turn can show an inflated ctx% or 100% ctx here even though the current prompt is still well below the model window.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58bafec132
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const contextTotal = input + cacheRead + cacheWrite; | ||
| const contextPercent = | ||
| contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null; | ||
| contextWindow && contextTotal > 0 | ||
| ? Math.min(Math.round((contextTotal / contextWindow) * 100), 100) |
There was a problem hiding this comment.
Normalize raw cache-token aliases before summing group context
chat.history forwards each assistant message's usage object as stored in the transcript (src/gateway/server-methods/chat.ts:980-988, covered by src/gateway/server.chat.gateway-server-chat-b.test.ts:290-317), but this new contextTotal only includes cache reads that were already normalized to cacheRead/cache_read_input_tokens. The shared usage normalizer also treats cached_tokens and prompt_tokens_details.cached_tokens as cache reads (src/agents/usage.ts:107-113), which is the shape used by Moonshot/Kimi-style cached responses. For those sessions, cacheRead remains 0 here, so the grouped footer still shows the old near-0% ctx even though the backend/session snapshot reports the larger cached prompt size.
Useful? React with 👍 / 👎.
|
I checked this against #51060 and I think there’s still one gap in the grouped footer path. This seems to fix the cache-token undercount case, but grouped % ctx still looks based on the cumulative prompt footprint across the whole visual group. In a post-compaction flow, I can still reproduce 100% ctx even when the latest assistant prompt footprint is much lower. I tried a small regression locally on this PR head and it still failed with: expected "27% ctx", received "100% ctx". So this looks very close, but I think #51060 may still need the grouped footer to use the latest assistant prompt footprint rather than the grouped sum. |
Thanks for testing this! You're right that the grouped footer path has a separate issue post-compaction. My PR specifically targets the cache-token undercount. Sessions with high prompt-cache hit rates were showing 0% ctx because only uncached input tokens were being counted. That's the bug I was seeing in practice and what this fixes. The grouped-sum-after-compaction behavior you're describing sounds like a different bug where the footer accumulates stale pre-compaction token counts into the group total. Sounds like it could be addressed in #51060 if that's already scoped for it. |
…t only The control dashboard footer showed 0-1% context usage when /status reported 21% for the same session. The root cause was that extractGroupMeta divided only the uncached input tokens by the context window, ignoring cacheRead and cacheWrite. For a session with 26k total tokens where 309 were uncached, this produced 0% instead of ~21%. Changes: - grouped-render.ts: compute contextPercent from input + cacheRead + cacheWrite (matching derivePromptTokens in the backend) - chat.ts: renderContextNotice (85%+ warning bar) now uses totalTokens (already input+cache in session store) instead of inputTokens Fixes openclaw#45268, openclaw#49824
Same issue as the footer - /usage divided only uncached inputTokens by the context window. Now uses totalTokens (input + cacheRead + cacheWrite) when available.
178787d to
ed51308
Compare
|
I also rebased onto main - upstream has since fixed the chat.ts side of this (totalTokensFresh guard), so the remaining diff is just the grouped footer fix in grouped-render.ts and the /usage command fix in slash-command-executor.ts. |
|
Closing this as implemented after Codex review. Current What I checked:
So I’m closing this as already implemented rather than keeping a duplicate issue open. Review notes: reviewed against 84dc9f12f1b9; fix evidence: commit 84dc9f12f1b9. |
Problem
The control dashboard footer shows 0-1% context usage when /status\ reports 21% for the same session.
Dashboard footer: \↑3 ↓505 R26.1k W309 0% ctx
/status: \Context: 26k/128k (21%)\
Root Cause
\extractGroupMeta\ in \grouped-render.ts\ computes \contextPercent\ by dividing only the uncached input tokens (\input) by the context window. For a session with 26k total tokens where only 309 were uncached (new), this produces \309 / 128000 = 0%\ instead of the correct \26000 / 128000 ≈ 21%.
The backend already computes this correctly via \derivePromptTokens\ (\input + cacheRead + cacheWrite), but three UI surfaces weren't using the same formula.
Fix
*\grouped-render.ts* (footer % ctx):
*\chat.ts* (
enderContextNotice, the 85%+ warning bar):
*\slash-command-executor.ts* (/usage\ command):
Repro
Observed with \github-copilot/claude-opus-4.6\ on OpenClaw 2026.3.13.
Fixes #49824