Skip to content

feat(agents): classify context budget pressure#84800

Closed
giodl73-repo wants to merge 3 commits into
openclaw:fix/68609-precheck-budget-logfrom
giodl73-repo:budget-3-context-budget-pressure
Closed

feat(agents): classify context budget pressure#84800
giodl73-repo wants to merge 3 commits into
openclaw:fix/68609-precheck-budget-logfrom
giodl73-repo:budget-3-context-budget-pressure

Conversation

@giodl73-repo

@giodl73-repo giodl73-repo commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Budget 3 in the context-budget stack, logically based on #84785. GitHub base is #84676 because #84785 is currently a fork-head branch; after #84785 lands or its branch is available upstream, this diff should collapse to the final budget-pressure commit.

This keeps budget management reporting-only and adds the first shared policy vocabulary on top of the persisted contextBudgetStatus snapshot:

  • add resolveSessionContextBudgetPolicy(...) with conservative pressure levels: safe, watch, pressure, and overflow-risk
  • classify against the prompt budget before reserve, so the policy reflects actual pre-prompt room instead of only raw context-window percentage
  • show the derived pressure in status as Budget: ... without changing the authoritative context token count
  • expose contextBudgetPressure on gateway session rows for UI/API consumers

This PR intentionally does not change compaction, truncation, or prompt-building behavior. It gives reviewers names and thresholds to tune before a later behavior PR starts acting on the pressure level.

Refs #80594, #54996, #77992, #84490, #83177, #43009, #83526, #8635.

Verification

Behavior addressed: OpenClaw can now classify the stored pre-prompt estimate into a shared context budget pressure level, and status/gateway consumers can see that level without changing runtime behavior.

Real environment tested: Windows worktree C:\src\claws-hapi-budget2, stacked on #84785 (eb4b8af6f3). The worktree uses a local node_modules junction to the main checkout's installed dependencies; no lockfile or dependency install changes were made.

Exact steps or command run after this patch:

  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/budget3.tsbuildinfo
  • node_modules\.bin\oxfmt.CMD --check src/config/sessions/context-budget-policy.ts src/config/sessions/context-budget-policy.test.ts src/config/sessions.ts src/status/status-message.ts src/auto-reply/status.test.ts src/gateway/session-utils.ts src/gateway/session-utils.types.ts src/gateway/session-utils.test.ts
  • git diff --check
  • OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-policy node scripts/run-vitest.mjs src/config/sessions/context-budget-policy.test.ts
  • OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "uses estimated context budget status"
  • OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status-fresh node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "prefers fresh totalTokens over estimated context budget status"
  • OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-gateway node scripts/run-vitest.mjs src/gateway/session-utils.test.ts -t "session rows expose derived context budget pressure"

Evidence after fix: Focused tests prove the safe/watch/pressure/overflow-risk classifier, status rendering of Budget: safe/watch, fresh-token precedence, and gateway row exposure. tsgo, oxfmt --check, and git diff --check are clean.

Observed result after fix: Status can render lines such as Context: ~640k/1.0m (64% est) · Budget: watch, while fresh usage still renders its real total and only adds the budget pressure as separate policy metadata. Gateway rows expose contextBudgetPressure: "watch" for the same snapshot.

What was not tested: Broad pnpm check:changed and full Vitest files were not run locally. The focused node-wrapper tests were used because broad pnpm-gated proof is expensive in this Codex worktree and direct parallel Vitest has cache-race risk here.

Review Notes

Automated Codex review was attempted with codex review --uncommitted, but nested Codex could not inspect the diff because every shell command failed with Windows sandbox error CreateProcessAsUserW failed: 1312. I did a manual diff review after that and tightened the gateway type import to use the sessions barrel.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This stacked PR persists pre-prompt context budget snapshots, derives safe/watch/pressure/overflow-risk policy, renders Budget in status, and exposes contextBudgetPressure on gateway session rows.

Reproducibility: yes. source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Summary: Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.

Rank-up moves:

  • Fix contextBudgetStatus persistence so heartbeat runs cannot overwrite the main session's budget snapshot, and add a regression beside the existing heartbeat bleed tests.
  • Add real behavior proof from a redacted local run, such as openclaw status output or gateway session-row output showing the new pressure field after the patch.
  • Retarget or collapse the stack once the prerequisite budget PRs land.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Risk before merge

  • If merged as-is, a background heartbeat run can persist its own contextBudgetStatus even though preserveRuntimeModel is intentionally preserving the main session's runtime model and context window, causing /status and gateway rows to show the wrong budget pressure for the user session.
  • This PR is still stacked on feat(agents): expose estimated context budget status #84785 and fix(agents): log pre-prompt compaction fits decisions #84676, so maintainers need the stack to land in order or collapse the diff before merge.
  • The safe/watch/pressure/overflow-risk thresholds are new user-visible policy vocabulary; the PR body correctly frames them as reviewer-tunable before later behavior starts acting on them.

Maintainer options:

  1. Preserve Heartbeat Isolation (recommended)
    Gate or preserve contextBudgetStatus under the same preserveRuntimeModel semantics and add a regression that a heartbeat result with contextBudgetStatus cannot overwrite the main session's budget state.
  2. Land The Stack In Order
    Wait for the prerequisite budget PRs to land or retarget so this branch only carries the final pressure-classification delta.
  3. Accept New Policy Vocabulary Deliberately
    Maintainers may accept the initial thresholds as reporting-only policy names, but that should be an explicit decision before later runtime behavior depends on them.

Next step before merge
This PR needs contributor real behavior proof and a session-state fix before maintainer review; ClawSweeper should not open a repair lane while the contributor proof gate is still unmet.

Security
Cleared: The diff does not add dependencies, workflows, secret handling, or new code-execution paths; the blocking issue is functional session-state correctness, not supply-chain/security.

Review findings

  • [P1] Preserve heartbeat budget snapshots — src/agents/command/session-store.ts:178-180
Review details

Best possible solution:

Keep the reporting-only budget surface, but make contextBudgetStatus obey the same heartbeat preservation boundary as runtime model/contextTokens and land it after the prerequisite budget stack is reviewed.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review.

Is this the best way to solve the issue?

No, not yet: the reporting-only policy surface is a reasonable direction, but the session-store write needs to preserve heartbeat isolation before merge. The safer solution is to update or retain contextBudgetStatus only when the run is allowed to update runtime session identity, with focused regression coverage.

Label changes:

  • add P2: This is a normal-priority feature/bug-risk PR touching session status and gateway metadata with limited blast radius but real session-state merge risk.
  • add merge-risk: 🚨 session-state: The diff writes a new persisted session field and can mis-associate budget state with heartbeat runs if merged as-is.
  • add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Label justifications:

  • P2: This is a normal-priority feature/bug-risk PR touching session status and gateway metadata with limited blast radius but real session-state merge risk.
  • merge-risk: 🚨 session-state: The diff writes a new persisted session field and can mis-associate budget state with heartbeat runs if merged as-is.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and Not merge-ready: the idea is useful, but the patch has a blocking heartbeat session-state regression and only test-based proof.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists focused tests, formatting, typecheck, and sample expected text, but does not show an after-fix real openclaw status or gateway session-row run; contributor should add redacted terminal output, logs, or a screenshot/recording of the real behavior and then update the PR body to trigger re-review.

Full review comments:

  • [P1] Preserve heartbeat budget snapshots — src/agents/command/session-store.ts:178-180
    Heartbeat runs pass preserveRuntimeModel so their temporary model/context window do not overwrite the main session, but this new block persists contextBudgetStatus unconditionally. If a heartbeat emits a snapshot, /status and gateway rows can show the heartbeat model's budget pressure for the user session; keep this behind the same preservation semantics or retain the existing status, and add a heartbeat regression test.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

Likely related people:

  • steipete: Blame on the current preserveRuntimeModel guard and heartbeat bleed comments points to Peter Steinberger, making him the clearest routing candidate for the session-store persistence boundary. (role: recent session-store and heartbeat isolation contributor; confidence: high; commits: d1470360c420; files: src/agents/command/session-store.ts, src/agents/agent-command.ts)
  • jalehman: GitHub path history for status-message shows recent context-window status work in commit 4004c93, which is adjacent to the Budget status rendering surface. (role: recent status/context-window contributor; confidence: medium; commits: 4004c9342d7d; files: src/status/status-message.ts)
  • pgondhi987: GitHub path history for gateway session utilities shows recent session lookup/scoping work in commit 6a12c6f, adjacent to the new gateway row field. (role: recent gateway session-row contributor; confidence: medium; commits: 6a12c6f79915; files: src/gateway/session-utils.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a30ac3f8d7cb.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@giodl73-repo giodl73-repo force-pushed the fix/68609-precheck-budget-log branch from d003bc2 to fc5969f Compare May 21, 2026 04:19
@giodl73-repo giodl73-repo deleted the branch openclaw:fix/68609-precheck-budget-log May 21, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: L status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant