feat(agents): classify context budget pressure by giodl73-repo · Pull Request #84800 · openclaw/openclaw

giodl73-repo · 2026-05-21T04:09:29Z

Summary

Budget 3 in the context-budget stack, logically based on #84785. GitHub base is #84676 because #84785 is currently a fork-head branch; after #84785 lands or its branch is available upstream, this diff should collapse to the final budget-pressure commit.

This keeps budget management reporting-only and adds the first shared policy vocabulary on top of the persisted contextBudgetStatus snapshot:

add resolveSessionContextBudgetPolicy(...) with conservative pressure levels: safe, watch, pressure, and overflow-risk
classify against the prompt budget before reserve, so the policy reflects actual pre-prompt room instead of only raw context-window percentage
show the derived pressure in status as Budget: ... without changing the authoritative context token count
expose contextBudgetPressure on gateway session rows for UI/API consumers

This PR intentionally does not change compaction, truncation, or prompt-building behavior. It gives reviewers names and thresholds to tune before a later behavior PR starts acting on the pressure level.

Refs #80594, #54996, #77992, #84490, #83177, #43009, #83526, #8635.

Verification

Behavior addressed: OpenClaw can now classify the stored pre-prompt estimate into a shared context budget pressure level, and status/gateway consumers can see that level without changing runtime behavior.

Real environment tested: Windows worktree C:\src\claws-hapi-budget2, stacked on #84785 (eb4b8af6f3). The worktree uses a local node_modules junction to the main checkout's installed dependencies; no lockfile or dependency install changes were made.

Exact steps or command run after this patch:

node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/budget3.tsbuildinfo
node_modules\.bin\oxfmt.CMD --check src/config/sessions/context-budget-policy.ts src/config/sessions/context-budget-policy.test.ts src/config/sessions.ts src/status/status-message.ts src/auto-reply/status.test.ts src/gateway/session-utils.ts src/gateway/session-utils.types.ts src/gateway/session-utils.test.ts
git diff --check
OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-policy node scripts/run-vitest.mjs src/config/sessions/context-budget-policy.test.ts
OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "uses estimated context budget status"
OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status-fresh node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "prefers fresh totalTokens over estimated context budget status"
OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-gateway node scripts/run-vitest.mjs src/gateway/session-utils.test.ts -t "session rows expose derived context budget pressure"

Evidence after fix: Focused tests prove the safe/watch/pressure/overflow-risk classifier, status rendering of Budget: safe/watch, fresh-token precedence, and gateway row exposure. tsgo, oxfmt --check, and git diff --check are clean.

Observed result after fix: Status can render lines such as Context: ~640k/1.0m (64% est) · Budget: watch, while fresh usage still renders its real total and only adds the budget pressure as separate policy metadata. Gateway rows expose contextBudgetPressure: "watch" for the same snapshot.

What was not tested: Broad pnpm check:changed and full Vitest files were not run locally. The focused node-wrapper tests were used because broad pnpm-gated proof is expensive in this Codex worktree and direct parallel Vitest has cache-race risk here.

Review Notes

Automated Codex review was attempted with codex review --uncommitted, but nested Codex could not inspect the diff because every shell command failed with Windows sandbox error CreateProcessAsUserW failed: 1312. I did a manual diff review after that and tightened the gateway type import to use the sessions barrel.

clawsweeper · 2026-05-21T04:10:58Z

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This stacked PR persists pre-prompt context budget snapshots, derives safe/watch/pressure/overflow-risk policy, renders Budget in status, and exposes contextBudgetPressure on gateway session rows.

Reproducibility: yes. source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Summary: Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.

Rank-up moves:

Fix contextBudgetStatus persistence so heartbeat runs cannot overwrite the main session's budget snapshot, and add a regression beside the existing heartbeat bleed tests.
Add real behavior proof from a redacted local run, such as openclaw status output or gateway session-row output showing the new pressure field after the patch.
Retarget or collapse the stack once the prerequisite budget PRs land.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Risk before merge

If merged as-is, a background heartbeat run can persist its own contextBudgetStatus even though preserveRuntimeModel is intentionally preserving the main session's runtime model and context window, causing /status and gateway rows to show the wrong budget pressure for the user session.
This PR is still stacked on feat(agents): expose estimated context budget status #84785 and fix(agents): log pre-prompt compaction fits decisions #84676, so maintainers need the stack to land in order or collapse the diff before merge.
The safe/watch/pressure/overflow-risk thresholds are new user-visible policy vocabulary; the PR body correctly frames them as reviewer-tunable before later behavior starts acting on them.

Maintainer options:

Preserve Heartbeat Isolation (recommended)
Gate or preserve contextBudgetStatus under the same preserveRuntimeModel semantics and add a regression that a heartbeat result with contextBudgetStatus cannot overwrite the main session's budget state.
Land The Stack In Order
Wait for the prerequisite budget PRs to land or retarget so this branch only carries the final pressure-classification delta.
Accept New Policy Vocabulary Deliberately
Maintainers may accept the initial thresholds as reporting-only policy names, but that should be an explicit decision before later runtime behavior depends on them.

Next step before merge
This PR needs contributor real behavior proof and a session-state fix before maintainer review; ClawSweeper should not open a repair lane while the contributor proof gate is still unmet.

Security
Cleared: The diff does not add dependencies, workflows, secret handling, or new code-execution paths; the blocking issue is functional session-state correctness, not supply-chain/security.

Review findings

[P1] Preserve heartbeat budget snapshots — src/agents/command/session-store.ts:178-180

Review details

Best possible solution:

Keep the reporting-only budget surface, but make contextBudgetStatus obey the same heartbeat preservation boundary as runtime model/contextTokens and land it after the prerequisite budget stack is reviewed.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review.

Is this the best way to solve the issue?

No, not yet: the reporting-only policy surface is a reasonable direction, but the session-store write needs to preserve heartbeat isolation before merge. The safer solution is to update or retain contextBudgetStatus only when the run is allowed to update runtime session identity, with focused regression coverage.

Label changes:

add P2: This is a normal-priority feature/bug-risk PR touching session status and gateway metadata with limited blast radius but real session-state merge risk.
add merge-risk: 🚨 session-state: The diff writes a new persisted session field and can mis-associate budget state with heartbeat runs if merged as-is.
add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.
add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Label justifications:

P2: This is a normal-priority feature/bug-risk PR touching session status and gateway metadata with limited blast radius but real session-state merge risk.
merge-risk: 🚨 session-state: The diff writes a new persisted session field and can mis-associate budget state with heartbeat runs if merged as-is.
rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Full review comments:

[P1] Preserve heartbeat budget snapshots — src/agents/command/session-store.ts:178-180
Heartbeat runs pass preserveRuntimeModel so their temporary model/context window do not overwrite the main session, but this new block persists contextBudgetStatus unconditionally. If a heartbeat emits a snapshot, /status and gateway rows can show the heartbeat model's budget pressure for the user session; keep this behind the same preservation semantics or retain the existing status, and add a heartbeat regression test.
Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

PR scope and proof: The live PR body says this is Budget 3 in a stack based on feat(agents): expose estimated context budget status #84785 and fix(agents): log pre-prompt compaction fits decisions #84676, adds reporting-only policy vocabulary, and lists focused tsgo, oxfmt, diff-check, and Vitest filter commands rather than a real status/gateway run. (7f8c30db2786)
Introduced write site: The PR writes result.meta.agentMeta.contextBudgetStatus into the session entry unconditionally after systemPromptReport handling. (src/agents/command/session-store.ts:178, 7f8c30db2786)
Heartbeat isolation contract: Current main already treats preserveRuntimeModel as a guard against heartbeat turns using a different model bleeding into the main session's perceived model/context window. (src/agents/command/session-store.ts:129, d1470360c420)
Heartbeat caller: The agent command path passes preserveRuntimeModel when bootstrapContextRunKind is heartbeat, so the new contextBudgetStatus write participates in that same heartbeat persistence path. (src/agents/agent-command.ts:1433, a30ac3f8d7cb)
Budget status consumers: The PR status and gateway row code consume the persisted contextBudgetStatus to display Budget and contextBudgetPressure, so stale or heartbeat-sourced snapshots become user/API-visible session state. (src/status/status-message.ts:869, 7f8c30db2786)
History routing: Local blame ties the heartbeat preserveRuntimeModel guard to Peter Steinberger, while GitHub path history shows recent status context-window work by Josh Lehman and gateway session-row work by Pavan Kumar Gondhi. (d1470360c420)

Likely related people:

steipete: Blame on the current preserveRuntimeModel guard and heartbeat bleed comments points to Peter Steinberger, making him the clearest routing candidate for the session-store persistence boundary. (role: recent session-store and heartbeat isolation contributor; confidence: high; commits: d1470360c420; files: src/agents/command/session-store.ts, src/agents/agent-command.ts)
jalehman: GitHub path history for status-message shows recent context-window status work in commit 4004c93, which is adjacent to the Budget status rendering surface. (role: recent status/context-window contributor; confidence: medium; commits: 4004c9342d7d; files: src/status/status-message.ts)
pgondhi987: GitHub path history for gateway session utilities shows recent session lookup/scoping work in commit 6a12c6f, adjacent to the new gateway row field. (role: recent gateway session-row contributor; confidence: medium; commits: 6a12c6f79915; files: src/gateway/session-utils.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a30ac3f8d7cb.

clawsweeper · 2026-05-21T04:17:11Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

Gio Della-Libera added 3 commits May 20, 2026 13:43

fix(agents): log pre-prompt compaction fits decisions

d003bc2

feat(agents): persist estimated context budget status

eb4b8af

feat(agents): classify context budget pressure

7f8c30d

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 21, 2026

giodl73-repo force-pushed the fix/68609-precheck-budget-log branch from d003bc2 to fc5969f Compare May 21, 2026 04:19

giodl73-repo mentioned this pull request May 21, 2026

fix(agents): log pre-prompt compaction fits decisions #84676

Merged

giodl73-repo deleted the branch openclaw:fix/68609-precheck-budget-log May 21, 2026 04:53

giodl73-repo closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agents): classify context budget pressure#84800

feat(agents): classify context budget pressure#84800
giodl73-repo wants to merge 3 commits into
openclaw:fix/68609-precheck-budget-logfrom
giodl73-repo:budget-3-context-budget-pressure

giodl73-repo commented May 21, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 21, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

giodl73-repo commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Review Notes

Uh oh!

clawsweeper Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

giodl73-repo commented May 21, 2026 •

edited

Loading

clawsweeper Bot commented May 21, 2026 •

edited

Loading