Skip to content

fix(ui): context % uses full token footprint instead of uncached input only#49917

Closed
jakepresent wants to merge 2 commits intoopenclaw:mainfrom
jakepresent:fix/control-ui-context-percent
Closed

fix(ui): context % uses full token footprint instead of uncached input only#49917
jakepresent wants to merge 2 commits intoopenclaw:mainfrom
jakepresent:fix/control-ui-context-percent

Conversation

@jakepresent
Copy link
Copy Markdown
Contributor

@jakepresent jakepresent commented Mar 18, 2026

Problem

The control dashboard footer shows 0-1% context usage when /status\ reports 21% for the same session.

Dashboard footer: \↑3 ↓505 R26.1k W309 0% ctx
/status: \Context: 26k/128k (21%)\

Root Cause

\extractGroupMeta\ in \grouped-render.ts\ computes \contextPercent\ by dividing only the uncached input tokens (\input) by the context window. For a session with 26k total tokens where only 309 were uncached (new), this produces \309 / 128000 = 0%\ instead of the correct \26000 / 128000 ≈ 21%.

The backend already computes this correctly via \derivePromptTokens\ (\input + cacheRead + cacheWrite), but three UI surfaces weren't using the same formula.

Fix

*\grouped-render.ts* (footer % ctx):

  • Compute \contextTotal = input + cacheRead + cacheWrite\ and use that as the numerator, matching \derivePromptTokens\ in \src/agents/usage.ts.

*\chat.ts* (
enderContextNotice, the 85%+ warning bar):

  • Use \ otalTokens\ (which is already \input + cacheRead + cacheWrite\ per \session-store.ts) instead of \inputTokens.

*\slash-command-executor.ts* (/usage\ command):

  • Use \ otalTokens\ for the context percentage instead of \inputTokens.

Repro

  1. Open a session with prompt caching active (most Anthropic/OpenAI models)
  2. Send a few messages to build context
  3. Compare footer % ctx\ with /status\ output
  4. Footer shows near-0%, /status\ shows the correct percentage

Observed with \github-copilot/claude-opus-4.6\ on OpenClaw 2026.3.13.

Fixes #49824

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 18, 2026

Greptile Summary

This PR fixes two related UI bugs where context usage percentages were being computed using only uncached input tokens (input) instead of the full token footprint (input + cacheRead + cacheWrite), causing the footer % ctx display and the 85%+ warning bar to show near-zero values for sessions with significant prompt-cache hits.

  • grouped-render.ts: Introduces a contextTotal intermediate variable (input + cacheRead + cacheWrite) and uses it as the numerator for contextPercent, matching derivePromptTokens in src/agents/usage.ts. The guard condition is also updated from input > 0 to contextTotal > 0 so that cache-only sessions (zero uncached input, non-zero cache hits) still display a valid percentage.
  • chat.ts (renderContextNotice): Replaces session?.inputTokens with session?.totalTokens ?? session?.inputTokens for the warning bar threshold calculation. GatewaySessionRow.totalTokens is already filtered for freshness server-side via resolveFreshSessionTotalTokens, so the inputTokens fallback is only exercised for sessions with no usage data yet.

Both changes are internally consistent and correctly align the UI with the derivePromptTokens / deriveSessionTotalTokens formula used by the backend /status command. No new tests are added for the changed render logic, though the fix is straightforward enough that manual repro steps in the PR description sufficiently validate the behavior.

Confidence Score: 5/5

  • This PR is safe to merge — the changes are isolated to two UI render functions and correctly align the context percentage calculation with the backend formula.
  • Both changes are narrowly scoped to display logic, the fix is consistent with derivePromptTokens / deriveSessionTotalTokens in the backend, the fallback path in chat.ts is safe (falls back to inputTokens when totalTokens is absent), and the new contextTotal > 0 guard in grouped-render.ts is strictly more correct than the old input > 0 guard. No behavioral regressions are possible from these changes.
  • No files require special attention.

Last reviewed commit: "fix(ui): /usage comm..."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53ee7ac534

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +240 to +243
const contextTotal = input + cacheRead + cacheWrite;
const contextPercent =
contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null;
contextWindow && contextTotal > 0
? Math.min(Math.round((contextTotal / contextWindow) * 100), 100)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid double-counting cached prompt tokens in group ctx %

For providers that return OpenAI-style prompt_tokens together with a cached subset (cached_tokens / prompt_tokens_details.cached_tokens), usage.input already includes the cached prompt portion. The normalization tests in src/agents/usage.test.ts:57-88 preserve that shape, so adding cacheRead and cacheWrite again here turns a 1,113-token prompt into 2,137 tokens. In chats backed by Moonshot/Kimi/K2-style responses, the footer will therefore overstate % ctx and can show 100% ctx even when the actual prompt is well below the model window.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

@jakepresent jakepresent Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. For Anthropic-style providers, input is only the uncached portion, so input + cacheRead + cacheWrite is the correct total prompt size (this matches derivePromptTokens in src/agents/usage.ts:138-151, which the backend already uses for /status).

For Moonshot/Kimi-style providers where prompt_tokens already includes cached tokens, adding cacheRead would double-count - but that's a pre-existing issue in derivePromptTokens itself, not introduced by this PR. The UI was previously showing 0% for all cached sessions; now it matches what /status reports (which also uses derivePromptTokens).

The normalization layer not consistently representing input as uncached-only is arguably the deeper issue, but that's a larger change across providers and outside the scope of this fix.

Comment thread ui/src/ui/views/chat.ts Outdated
// totalTokens is a prompt/context snapshot (input + cacheRead + cacheWrite),
// which represents actual context occupancy. Fall back to inputTokens when
// totalTokens is unavailable (stale or missing).
const used = session?.totalTokens ?? session?.inputTokens ?? 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep context warning based on non-duplicated prompt tokens

session.totalTokens is derived as input + cacheRead + cacheWrite in src/agents/usage.ts:153-190, while the usage normalization tests in src/agents/usage.test.ts:57-88 show that some providers already report input as full prompt_tokens and expose the cached subset separately. Switching this banner from inputTokens to totalTokens therefore makes cached conversations from Moonshot/Kimi/K2-style providers look much fuller than they are, which can trigger the 85%/90% warning and compaction UX too early.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

@jakepresent jakepresent Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above - this change aligns the UI with what the backend /status already reports via deriveSessionTotalTokens. The potential double-counting for Moonshot/Kimi-style providers exists in derivePromptTokens already; this PR makes the UI consistent with the backend rather than introducing a new discrepancy.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8f842a4e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +240 to +243
const contextTotal = input + cacheRead + cacheWrite;
const contextPercent =
contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null;
contextWindow && contextTotal > 0
? Math.min(Math.round((contextTotal / contextWindow) * 100), 100)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the latest prompt snapshot for grouped context %

When a chat turn spans multiple assistant messages, this group footer now adds every message's input + cacheRead + cacheWrite together. That overstates context in the common case where tool results are hidden: buildChatItems drops tool messages before grouping (ui/src/ui/views/chat.ts:1417-1418), then groupMessages merges the surrounding assistant messages (ui/src/ui/views/chat.ts:1355-1373). The cache fields are explicitly treated as non-additive snapshots elsewhere (src/agents/pi-embedded-subscribe.ts:201-213), so a tool-heavy turn can show an inflated ctx% or 100% ctx here even though the current prompt is still well below the model window.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58bafec132

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +240 to +243
const contextTotal = input + cacheRead + cacheWrite;
const contextPercent =
contextWindow && input > 0 ? Math.min(Math.round((input / contextWindow) * 100), 100) : null;
contextWindow && contextTotal > 0
? Math.min(Math.round((contextTotal / contextWindow) * 100), 100)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize raw cache-token aliases before summing group context

chat.history forwards each assistant message's usage object as stored in the transcript (src/gateway/server-methods/chat.ts:980-988, covered by src/gateway/server.chat.gateway-server-chat-b.test.ts:290-317), but this new contextTotal only includes cache reads that were already normalized to cacheRead/cache_read_input_tokens. The shared usage normalizer also treats cached_tokens and prompt_tokens_details.cached_tokens as cache reads (src/agents/usage.ts:107-113), which is the shape used by Moonshot/Kimi-style cached responses. For those sessions, cacheRead remains 0 here, so the grouped footer still shows the old near-0% ctx even though the backend/session snapshot reports the larger cached prompt size.

Useful? React with 👍 / 👎.

@shuv-amp
Copy link
Copy Markdown

I checked this against #51060 and I think there’s still one gap in the grouped footer path. This seems to fix the cache-token undercount case, but grouped % ctx still looks based on the cumulative prompt footprint across the whole visual group. In a post-compaction flow, I can still reproduce 100% ctx even when the latest assistant prompt footprint is much lower. I tried a small regression locally on this PR head and it still failed with: expected "27% ctx", received "100% ctx". So this looks very close, but I think #51060 may still need the grouped footer to use the latest assistant prompt footprint rather than the grouped sum.

@jakepresent
Copy link
Copy Markdown
Contributor Author

jakepresent commented Mar 26, 2026

I checked this against #51060 and I think there’s still one gap in the grouped footer path. This seems to fix the cache-token undercount case, but grouped % ctx still looks based on the cumulative prompt footprint across the whole visual group. In a post-compaction flow, I can still reproduce 100% ctx even when the latest assistant prompt footprint is much lower. I tried a small regression locally on this PR head and it still failed with: expected "27% ctx", received "100% ctx". So this looks very close, but I think #51060 may still need the grouped footer to use the latest assistant prompt footprint rather than the grouped sum.

Thanks for testing this! You're right that the grouped footer path has a separate issue post-compaction. My PR specifically targets the cache-token undercount. Sessions with high prompt-cache hit rates were showing 0% ctx because only uncached input tokens were being counted. That's the bug I was seeing in practice and what this fixes.

The grouped-sum-after-compaction behavior you're describing sounds like a different bug where the footer accumulates stale pre-compaction token counts into the group total. Sounds like it could be addressed in #51060 if that's already scoped for it.

…t only

The control dashboard footer showed 0-1% context usage when /status
reported 21% for the same session.  The root cause was that
extractGroupMeta divided only the uncached input tokens by the context
window, ignoring cacheRead and cacheWrite.  For a session with 26k
total tokens where 309 were uncached, this produced 0% instead of ~21%.

Changes:
- grouped-render.ts: compute contextPercent from input + cacheRead +
  cacheWrite (matching derivePromptTokens in the backend)
- chat.ts: renderContextNotice (85%+ warning bar) now uses totalTokens
  (already input+cache in session store) instead of inputTokens

Fixes openclaw#45268, openclaw#49824
Same issue as the footer - /usage divided only uncached inputTokens
by the context window.  Now uses totalTokens (input + cacheRead +
cacheWrite) when available.
@jakepresent jakepresent force-pushed the fix/control-ui-context-percent branch from 178787d to ed51308 Compare March 26, 2026 21:21
@jakepresent
Copy link
Copy Markdown
Contributor Author

I also rebased onto main - upstream has since fixed the chat.ts side of this (totalTokensFresh guard), so the remaining diff is just the grouped footer fix in grouped-render.ts and the /usage command fix in slash-command-executor.ts.

@steipete
Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex review.

Current main already implements the requested cached-token context accounting for the grouped footer and /usage, and the warning-bar side was separately folded into the fresh totalTokens snapshot path with regression tests.

What I checked:

  • Grouped footer uses full prompt footprint: extractGroupMeta now computes promptTokens = input + cacheRead + cacheWrite and derives contextPercent from that total, matching the PR's core footer fix. (ui/src/ui/chat/grouped-render.ts:470, 84dc9f12f1b9)
  • /usage uses fresh context snapshot: executeUsage now prefers session.totalTokens for the context percentage and suppresses the percentage when totalTokensFresh is false, which covers the PR's remaining /usage change plus the later freshness follow-up. (ui/src/ui/chat/slash-command-executor.ts:385, 84dc9f12f1b9)
  • Control warning bar already moved to totalTokens: The context notice path reads session.totalTokens, ignores stale snapshots, and computes the warning ratio from that value rather than raw inputTokens. This matches the author's March 26 note that the chat.ts side had already been fixed upstream. (ui/src/ui/chat/context-notice.ts:64, 84dc9f12f1b9)
  • Backend normalization now avoids cached-token double counting: normalizeUsage subtracts OpenAI-style cached prompt tokens from raw prompt totals before deriving prompt/context usage, addressing the review-thread concern that some providers would otherwise overcount when cacheRead is added back in. (src/agents/usage.ts:130, 84dc9f12f1b9)
  • Regression tests cover cached-token and snapshot behavior: UI tests now assert 44% ctx for cached footer usage, 38% and 31% /usage percentages from the session snapshot, and that stale snapshots hide the warning/percentage paths. (ui/src/ui/chat/grouped-render.test.ts:352, 84dc9f12f1b9)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 84dc9f12f1b9; fix evidence: commit 84dc9f12f1b9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: webchat status bar shows incorrect remaining tokens and context percentage

3 participants