feat(agents): classify context budget pressure#84800
Conversation
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Keep the reporting-only budget surface, but make contextBudgetStatus obey the same heartbeat preservation boundary as runtime model/contextTokens and land it after the prerequisite budget stack is reviewed. Do we have a high-confidence way to reproduce the issue? Yes, source-level reproduction is high confidence: heartbeat runs set preserveRuntimeModel, current main uses that to prevent heartbeat model/context bleed, and the PR writes contextBudgetStatus outside that guard. I did not run a live heartbeat scenario because this is a read-only review. Is this the best way to solve the issue? No, not yet: the reporting-only policy surface is a reasonable direction, but the session-store write needs to preserve heartbeat isolation before merge. The safer solution is to update or retain contextBudgetStatus only when the run is allowed to update runtime session identity, with focused regression coverage. Label changes:
Label justifications:
Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against a30ac3f8d7cb. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
d003bc2 to
fc5969f
Compare
Summary
Budget 3 in the context-budget stack, logically based on #84785. GitHub base is #84676 because #84785 is currently a fork-head branch; after #84785 lands or its branch is available upstream, this diff should collapse to the final budget-pressure commit.
This keeps budget management reporting-only and adds the first shared policy vocabulary on top of the persisted
contextBudgetStatussnapshot:resolveSessionContextBudgetPolicy(...)with conservative pressure levels:safe,watch,pressure, andoverflow-riskBudget: ...without changing the authoritative context token countcontextBudgetPressureon gateway session rows for UI/API consumersThis PR intentionally does not change compaction, truncation, or prompt-building behavior. It gives reviewers names and thresholds to tune before a later behavior PR starts acting on the pressure level.
Refs #80594, #54996, #77992, #84490, #83177, #43009, #83526, #8635.
Verification
Behavior addressed: OpenClaw can now classify the stored pre-prompt estimate into a shared context budget pressure level, and status/gateway consumers can see that level without changing runtime behavior.
Real environment tested: Windows worktree
C:\src\claws-hapi-budget2, stacked on #84785 (eb4b8af6f3). The worktree uses a localnode_modulesjunction to the main checkout's installed dependencies; no lockfile or dependency install changes were made.Exact steps or command run after this patch:
node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/budget3.tsbuildinfonode_modules\.bin\oxfmt.CMD --check src/config/sessions/context-budget-policy.ts src/config/sessions/context-budget-policy.test.ts src/config/sessions.ts src/status/status-message.ts src/auto-reply/status.test.ts src/gateway/session-utils.ts src/gateway/session-utils.types.ts src/gateway/session-utils.test.tsgit diff --checkOPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-policy node scripts/run-vitest.mjs src/config/sessions/context-budget-policy.test.tsOPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "uses estimated context budget status"OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-status-fresh node scripts/run-vitest.mjs src/auto-reply/status.test.ts -t "prefers fresh totalTokens over estimated context budget status"OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=.artifacts/vitest-cache/budget3-gateway node scripts/run-vitest.mjs src/gateway/session-utils.test.ts -t "session rows expose derived context budget pressure"Evidence after fix: Focused tests prove the safe/watch/pressure/overflow-risk classifier, status rendering of
Budget: safe/watch, fresh-token precedence, and gateway row exposure.tsgo,oxfmt --check, andgit diff --checkare clean.Observed result after fix: Status can render lines such as
Context: ~640k/1.0m (64% est) · Budget: watch, while fresh usage still renders its real total and only adds the budget pressure as separate policy metadata. Gateway rows exposecontextBudgetPressure: "watch"for the same snapshot.What was not tested: Broad
pnpm check:changedand full Vitest files were not run locally. The focused node-wrapper tests were used because broad pnpm-gated proof is expensive in this Codex worktree and direct parallel Vitest has cache-race risk here.Review Notes
Automated Codex review was attempted with
codex review --uncommitted, but nested Codex could not inspect the diff because every shell command failed with Windows sandbox errorCreateProcessAsUserW failed: 1312. I did a manual diff review after that and tightened the gateway type import to use the sessions barrel.