Skip to content

fix(agents): scope stale usage clearing to pre-compaction messages only [AI-assisted]#91468

Open
425072024 wants to merge 5 commits into
openclaw:mainfrom
425072024:fix/compaction-stale-usage-clear-50795
Open

fix(agents): scope stale usage clearing to pre-compaction messages only [AI-assisted]#91468
425072024 wants to merge 5 commits into
openclaw:mainfrom
425072024:fix/compaction-stale-usage-clear-50795

Conversation

@425072024

@425072024 425072024 commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • Problem: After compaction, the TUI context token counter always shows 0/1.0m (0%) even after successful LLM responses. Issue Bug: Context token count always shows 0 after compaction #50795.
  • Root Cause: clearStaleAssistantUsageOnSessionMessages unconditionally zeros usage on all assistant messages after every successful compaction. Post-compaction messages with valid usage are reset alongside stale pre-compaction messages.
  • Fix: Track pre-compaction assistant messages via object references (preCompactionAssistantMessages: Set<object>) captured in handleCompactionStart. In clearStaleAssistantUsageOnSessionMessages, only zero usage for messages whose references exist in the Set. New messages are excluded.
  • Why object-identity: Compaction rewrites session.messages, invalidating indices. Object references survive rewrites — retained messages keep their identity, new messages have distinct references.

Verification

  • compaction handler tests: 12/12 pass
  • oxfmt --check passes

Fixes #50795

clearStaleAssistantUsageOnSessionMessages unconditionally zeroed
usage on *all* assistant messages after every compaction. This broke
the context token counter in the TUI because new assistant messages
with valid usage from fresh LLM responses were also zeroed.

Store the pre-compaction message count in handleCompactionStart and
only clear usage on messages that existed before the compaction.
Messages added during or after compaction retain their valid usage.

Fixes openclaw#50795
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 8, 2026
@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 9, 2026, 5:34 AM ET / 09:34 UTC.

Summary
The PR adds a pre-compaction assistant-message reference set to embedded-agent subscription state and uses it to skip zeroing usage on assistant messages created after compaction.

PR surface: Source +87. Total +87 across 2 files.

Reproducibility: yes. from source inspection: current main clears every assistant usage snapshot on non-retry compaction end, and the linked report describes the visible counter consequence. I did not run a live TUI/gateway reproduction in this read-only review.

Review metrics: 2 noteworthy metrics.

  • Focused regression tests: 0 added. The patch changes session-state accounting, but the diff does not add a test for preserving fresh post-compaction assistant usage.
  • Subscription state surface: 1 field added. The new pre-compaction reference set is the durable boundary that decides which assistant usage snapshots are cleared.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add or point to a focused regression test for preserving post-compaction assistant usage.
  • [P1] Add redacted real TUI/gateway counter proof to the PR body.
  • Rebase or otherwise refresh the branch so GitHub reports a clean merge state.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR has test and formatter output only; it still needs redacted terminal output, logs, screenshot, recording, or linked artifact showing the real post-compaction counter after a fresh assistant response. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A visible proof lane would materially help because the reported failure is the TUI context counter after compaction. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify the TUI Context counter shows nonzero usage after compaction and a fresh assistant response.

Risk before merge

  • [P1] The contributor proof is still test-only; it does not show the real TUI or gateway context counter after compaction and a fresh assistant response.
  • [P1] The branch does not add focused regression coverage for the object-identity boundary, so the session-state invariant can regress silently.
  • [P1] The live PR metadata reports a dirty merge state, so the reviewed diff is not yet the exact merge result maintainers would land.

Maintainer options:

  1. Refresh, test, and prove (recommended)
    Rebase or resolve the dirty branch, add the focused compaction-boundary regression test, and update the PR body with redacted real TUI/gateway counter proof before merge.
  2. Maintainer-owned proof
    A maintainer could run and document the real compaction counter scenario themselves, then accept the remaining contributor-proof gap explicitly.
  3. Pause until evidence is available
    If the branch cannot be refreshed or the real counter path cannot be demonstrated, keep the PR paused rather than landing session-state accounting changes on test-only evidence.

Next step before merge

  • [P1] The remaining blockers are contributor real behavior proof, focused regression coverage, and a dirty PR branch, not a safe ClawSweeper repair lane.

Security
Cleared: The diff only changes in-memory compaction accounting/types; I found no dependency, CI, secret-handling, or code-execution surface change.

Review findings

  • [P2] Add the compaction-boundary regression test — src/agents/embedded-agent-subscribe.handlers.compaction.ts:234-235
Review details

Best possible solution:

Land a refreshed version that preserves fresh post-compaction assistant usage, keeps stale pre-compaction snapshots zeroed, and includes both focused regression coverage and redacted real counter proof.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main clears every assistant usage snapshot on non-retry compaction end, and the linked report describes the visible counter consequence. I did not run a live TUI/gateway reproduction in this read-only review.

Is this the best way to solve the issue?

Yes, with validation gaps: the object-reference boundary is a narrow fix because current compaction rebuilds the message array from retained message objects and the agent setter only copies the top-level array. It still needs regression coverage and real behavior proof before it is the best mergeable version.

Full review comments:

  • [P2] Add the compaction-boundary regression test — src/agents/embedded-agent-subscribe.handlers.compaction.ts:234-235
    The new guard relies on object identity captured at compaction start, but the branch still does not add a test that runs compaction_start, rewrites session.messages, keeps a pre-compaction assistant, and verifies a fresh post-compaction assistant keeps nonzero usage. Without that coverage, the session-state invariant that fixes the visible counter can regress silently.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.78

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5e1fbca3cbc6.

Label changes

Label justifications:

  • P2: This is a normal-priority session-state bug fix with limited blast radius but visible TUI impact.
  • merge-risk: 🚨 session-state: The PR changes how compaction mutates assistant usage snapshots, which can affect context counter and session accounting state after merge.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR has test and formatter output only; it still needs redacted terminal output, logs, screenshot, recording, or linked artifact showing the real post-compaction counter after a fresh assistant response. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +87. Total +87 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 2 97 10 +87
Tests 0 0 0 0
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 97 10 +87

What I checked:

Likely related people:

  • Patrick Erichsen: The relevant compaction subscription handler, session reconstruction path, and agent state implementation all blame to commit e8cf6df in current history. (role: recent area contributor; confidence: high; commits: e8cf6df3a358; files: src/agents/embedded-agent-subscribe.handlers.compaction.ts, src/agents/sessions/agent-session.ts, src/agents/sessions/session-manager.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 8, 2026
@425072024

Copy link
Copy Markdown
Author

🔧 修复说明 (Addressing ClawSweeper P1 feedback)

问题:index-based boundary 在 compaction 重写后不可靠

原 PR 使用 preCompactionMessageCount(记录 compaction 前的消息数),然后用索引比较 i >= preCount 来跳过新消息。但 ClawSweeper 指出:compaction 会重写 session messages,所以 compaction 前的索引在 compaction 后可能不再有意义。

修复:改用 object-identity Set

preCompactionMessageCount?: number 替换为 preCompactionAssistantMessages?: Set<object>

  1. handleCompactionStart: 新增 collectAssistantMessages() 辅助函数,遍历所有消息,将 assistant 消息的对象引用收集到 Set 中
  2. clearStaleAssistantUsageOnSessionMessages: 用 Set.has() 做引用比较,只清零存在于 Set 中的消息(pre-compaction 保留的消息),不在 Set 中的消息(compaction 后新增的)被 continue 跳过

为什么 object-identity 是安全的

  • Compaction 重写 session.messages 时,保留的消息保持其对象引用身份
  • Compaction 后新增的消息不会是 Set 中的引用
  • 因此能准确区分 pre-compaction 和 post-compaction 消息

修改文件

  • src/agents/embedded-agent-subscribe.handlers.types.tspreCompactionMessageCount?: numberpreCompactionAssistantMessages?: Set<object>
  • src/agents/embedded-agent-subscribe.handlers.compaction.ts — 新增 collectAssistantMessages() + 重写清除循环逻辑

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@425072024

Copy link
Copy Markdown
Author

🔄 All P1 Fixes Complete

  1. ✅ Object-identity Set replaces index-based cutoff — safe across compaction rewrites
  2. ✅ Merge conflict resolved — rebased onto latest base
  3. ✅ 12 compaction handler tests pass

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Context token count always shows 0 after compaction

1 participant