Skip to content

feat(agents): expose estimated context budget status#84785

Closed
giodl73-repo wants to merge 2 commits into
openclaw:fix/68609-precheck-budget-logfrom
giodl73-repo:budget-2-context-budget-status
Closed

feat(agents): expose estimated context budget status#84785
giodl73-repo wants to merge 2 commits into
openclaw:fix/68609-precheck-budget-logfrom
giodl73-repo:budget-2-context-budget-status

Conversation

@giodl73-repo

Copy link
Copy Markdown
Contributor

Summary

Budget 2 in the context-budget stack, based on #84676.

This keeps the first PR's pre-prompt budget observation path and adds the read-only session/status surface for it:

  • capture a typed contextBudgetStatus snapshot from the pre-prompt estimate, including route, estimate, budget, reserve, overflow, and message counts
  • carry that snapshot through attempt metadata into SessionEntry, then expose it on gateway session rows
  • let /status show estimated context usage (~tokens/window (% est)) only when fresh provider context usage is unavailable or absent; positive fresh totalTokens remains authoritative
  • reserve the new top-level session field from plugin session-entry slot mirroring

This is deliberately reporting-only. It does not introduce new compaction thresholds or policy decisions; that belongs in the next budget-management PR once the data is visible.

Refs #80594, #54996, #77992, #84490, #83177, #43009, #83526, #8635.

Verification

Behavior addressed: Sessions whose model/provider path cannot return fresh context usage can still expose OpenClaw's own pre-prompt estimate instead of showing only unknown/zero context usage.

Real environment tested: Windows worktree C:\src\claws-hapi-budget2, stacked on #84676 (d003bc28a9). The worktree uses a local node_modules junction to the main checkout's installed dependencies; no lockfile or dependency install changes were made.

Exact steps or command run after this patch:

  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/budget2.tsbuildinfo
  • node_modules\.bin\oxfmt.CMD --check src/config/sessions/types.ts src/plugins/session-entry-slot-keys.ts src/agents/pi-embedded-runner/run/preemptive-compaction.ts src/agents/pi-embedded-runner/types.ts src/agents/pi-embedded-runner/run/types.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run.ts src/agents/command/session-store.ts src/status/status-message.ts src/gateway/session-utils.types.ts src/gateway/session-utils.ts src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts src/agents/command/session-store.test.ts src/auto-reply/status.test.ts
  • git diff --check
  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts -t "builds a durable estimated context budget status snapshot"
  • node scripts/run-vitest.mjs src/agents/command/session-store.test.ts -t "persists estimated context budget status"
  • node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "uses estimated context budget status"
  • node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "prefers fresh totalTokens over estimated context budget status"
  • node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "uses estimated context budget status when token usage is absent"

Evidence after fix: Focused tests prove the snapshot shape, persistence without marking stale totals fresh, estimated status fallback, fresh-token precedence, and absent-usage fallback. tsgo, oxfmt --check, and git diff --check are clean.

Observed result after fix: Status can render Context: ~640k/1.0m (64% est) when stored totalTokens is stale, continues to render Context: 36k/1.0m (4%) when fresh usage exists, and avoids Context: 0/1.0m when only the estimate is available.

What was not tested: Full Vitest files and broad pnpm check:changed were not run locally. The full heavy files timed out in this Codex worktree dependency setup; the new assertions were run by focused filters with unique Vitest cache paths where needed.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a typed pre-prompt context budget snapshot, persists it in SessionEntry, exposes it on Gateway session rows, and lets /status display estimated context usage when fresh usage is unavailable.

Reproducibility: yes. for the review finding via source inspection: heartbeat runs pass preserveRuntimeModel, while the new session-store assignment still persists contextBudgetStatus. I did not run the PR branch or a live heartbeat scenario.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Summary: The implementation is directionally useful but not merge-ready because proof is test-only and the patch has a session-state correctness finding.

Rank-up moves:

  • Fix the preserved-runtime/heartbeat budget status ownership bug and add a focused regression test.
  • Add redacted real behavior proof showing the estimated /status or Gateway session-row output after the patch.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs real behavior proof before merge: The PR body lists focused unit tests, tsgo, oxfmt, and diff check, but it does not show /status or Gateway session output from a real OpenClaw run after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • The new field can preserve a heartbeat run's budget estimate on the main session even when runtime model/context are deliberately preserved, so later /status may show a snapshot for the wrong runtime.
  • The PR body contains focused tests and type/format checks, but no after-fix real /status or Gateway session output from a running OpenClaw setup.

Maintainer options:

  1. Preserve main-session budget ownership (recommended)
    Before merge, make contextBudgetStatus obey preserveRuntimeModel or clear stale snapshots, then add a heartbeat-bleed regression test.
  2. Accept diagnostic bleed intentionally
    Maintainers could accept the risk only if heartbeat-derived estimates are explicitly acceptable in main-session status and documented as such.

Next step before merge
Human review is needed because the PR is maintainer-labeled and needs both a small session-state fix and contributor real behavior proof before merge.

Security
Cleared: No dependency, CI, secret, permission, or code-execution surface changes were found; the concrete concern is session metadata correctness rather than a security regression.

Review findings

  • [P2] Keep heartbeat budget snapshots out of main session status — src/agents/command/session-store.ts:178-180
Review details

Best possible solution:

Persist and expose estimated context budget status only when it belongs to the same session runtime being reported, keep preserved-runtime heartbeat runs from replacing main-session budget status, and then prove the status output in a real run.

Do we have a high-confidence way to reproduce the issue?

Yes for the review finding via source inspection: heartbeat runs pass preserveRuntimeModel, while the new session-store assignment still persists contextBudgetStatus. I did not run the PR branch or a live heartbeat scenario.

Is this the best way to solve the issue?

No: the reporting path is plausible, but the PR should not persist or display a budget snapshot that does not match the reported session runtime. The safer fix is to gate or clear the snapshot alongside the existing preserved-runtime model/context behavior.

Label changes:

  • add P2: This is a normal-priority feature PR with a focused session-status correctness blocker and limited blast radius.
  • add merge-risk: 🚨 session-state: Merging as-is can mis-associate a budget snapshot from a heartbeat runtime with the main session state shown by /status.
  • add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🦐 gold shrimp, and The implementation is directionally useful but not merge-ready because proof is test-only and the patch has a session-state correctness finding.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused unit tests, tsgo, oxfmt, and diff check, but it does not show /status or Gateway session output from a real OpenClaw run after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority feature PR with a focused session-status correctness blocker and limited blast radius.
  • merge-risk: 🚨 session-state: Merging as-is can mis-associate a budget snapshot from a heartbeat runtime with the main session state shown by /status.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🦐 gold shrimp, and The implementation is directionally useful but not merge-ready because proof is test-only and the patch has a session-state correctness finding.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused unit tests, tsgo, oxfmt, and diff check, but it does not show /status or Gateway session output from a real OpenClaw run after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

  • [P2] Keep heartbeat budget snapshots out of main session status — src/agents/command/session-store.ts:178-180
    preserveRuntimeModel is used for heartbeat runs so the heartbeat model/context do not bleed into the main session, but this new assignment still persists the heartbeat run's contextBudgetStatus. When a heartbeat uses a different model and the main session later lacks fresh totalTokens, /status can report the heartbeat estimate/window for the main session. Gate this field the same way as runtime model/context, or clear it when preserving runtime metadata, with coverage next to the heartbeat bleed tests.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

  • Protected label: The provided GitHub context lists the maintainer label on this PR, which requires explicit maintainer handling rather than automated cleanup closure.
  • Patch persists budget status unconditionally: The PR copies result.meta.agentMeta.contextBudgetStatus into the session entry without checking preserveRuntimeModel, so heartbeat or other preserved-runtime runs can still replace the visible budget snapshot. (src/agents/command/session-store.ts:178, eb4b8af6f3fa)
  • Heartbeat preservation contract: Current main documents preserveRuntimeModel as the guard that keeps a heartbeat turn's model and context window from bleeding into the main session's perceived state. (src/agents/command/session-store.ts:67, 6745fe8e7046)
  • Heartbeat caller: Heartbeat runs pass preserveRuntimeModel: opts.bootstrapContextRunKind === "heartbeat" when updating the session store, so the new budget status field needs the same ownership guard. (src/agents/agent-command.ts:1433, 6745fe8e7046)
  • Status fallback consumes the persisted snapshot: The PR changes /status to render contextBudgetStatus whenever current total tokens are missing or zero, which makes a mis-owned persisted snapshot user-visible. (src/status/status-message.ts:852, eb4b8af6f3fa)
  • Proof is test-only: The PR body reports focused Vitest filters, tsgo, oxfmt, and diff-check output, but no after-fix /status or Gateway session output from a running OpenClaw setup. (eb4b8af6f3fa)

Likely related people:

  • steipete: Current-main blame in the available checkout attributes the relevant status/session-store/gateway/session-slot lines to Peter Steinberger's grafted dependency-update snapshot; the local history is shallow, so this is a routing hint rather than full ownership proof. (role: recent area contributor; confidence: low; commits: 94ac563399b3; files: src/status/status-message.ts, src/agents/command/session-store.ts, src/gateway/session-utils.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6745fe8e7046.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@giodl73-repo

Copy link
Copy Markdown
Contributor Author

Could not reopen this closed PR because GitHub would not allow changing the closed PR's old stack base. I rebased the branch onto main, fixed the heartbeat/preserved-runtime budget snapshot issue from the review, and opened the replacement here: #84830.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant