fix(codex): prefer native tokens for resume budget by hansolo949 · Pull Request #83229 · openclaw/openclaw

hansolo949 · 2026-05-17T18:26:25Z

Summary

This is the narrow follow-up for the remaining Codex app-server resume-budget edge case. Upstream main already has the broader native rollout guard and dynamic tool-result cap; this patch only changes token precedence when both native rollout usage and mirrored OpenClaw session totals are present.

Real behavior proof

Behavior or issue addressed: OpenClaw can keep a high mirrored session token total after Codex app-server compaction. Current main reads native last_token_usage, but still evaluates the resume guard with max(sessionTokens, nativeTokens), so a stale mirrored total can rotate a healthy compacted native thread. This patch prefers nativeTokens ?? sessionTokens.

Real environment tested: Han local OpenClaw install on macOS, OpenClaw 2026.5.16-beta.3, Gateway running against the main agent with WebChat/Codex app-server bindings.

Exact steps or command run after this patch: Applied the same one-line runtime patch locally, restarted/checked OpenClaw, verified the installed runtime contains const tokenCount = nativeTokens ?? sessionTokens;, and ran the focused app-server regression lane in this rebased PR worktree.

Evidence after fix: Live terminal output from the local OpenClaw setup after the patch:

$ rg -n "const tokenCount = nativeTokens \\?\\? sessionTokens" /usr/local/lib/node_modules/openclaw/dist/run-attempt-DEhr_oag.js
/usr/local/lib/node_modules/openclaw/dist/run-attempt-DEhr_oag.js:2543:\tconst tokenCount = nativeTokens ?? sessionTokens;

$ openclaw health --json | jq '{ok,eventLoop:.eventLoop,telegram:.channels.telegram.connected}'
{
  "ok": true,
  "eventLoop": {
    "degraded": false,
    "reasons": [],
    "delayP99Ms": 22.8,
    "utilization": 0.021,
    "cpuCoreRatio": 0.022
  },
  "telegram": true
}

Observed result after fix: The local installed runtime now uses native Codex rollout token usage first, so a compacted native thread is not forced to rotate solely because OpenClaw mirrored session totals are stale. Gateway stayed healthy after the patch. The rebased PR regression also passed: Test Files 1 passed (1); Tests 7 passed | 151 skipped (158).

What was not tested: Full repository test suite and a live upstream CI WebChat session. The live local Gateway/runtime check and focused app-server regression were tested.

Supplemental validation

node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
git diff --check
tmp=$(mktemp); printf '%s\n' '["codex/src/app-server/run-attempt.test.ts"]' > "$tmp"
OPENCLAW_VITEST_INCLUDE_FILE="$tmp" OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts --testNamePattern 'native rollout|stale mirrored'

Supplemental results: oxfmt passed, git diff whitespace passed, focused Vitest passed.

clawsweeper · 2026-05-17T18:27:36Z

Codex review: needs real behavior proof before merge.

Latest ClawSweeper review: 2026-05-22 22:30 UTC / May 22, 2026, 6:30 PM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR changes the Codex app-server resume-budget guard to prefer native rollout token usage over mirrored session totals and adds a stale-token regression test.

Reproducibility: yes. source-reproducible: current main computes max(sessionTokens, nativeTokens), so a 96,000 mirrored total with native last_token_usage of 12,000 crosses the 70,000 resume limit and rotates. I did not execute tests in this read-only review.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Summary: The patch is clean and focused, but merge confidence is capped because real behavior proof does not exercise the changed resume path.

Rank-up moves:

Add redacted live terminal output, logs, or a short recording showing a stale mirrored session total with native last_token_usage resuming without thread/start.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: Terminal proof shows the patched installed file and a healthy Gateway, but it does not show the actual stale mirrored-token resume behavior after the fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

Contributor proof shows the patched installed runtime and Gateway health, but not the real stale mirrored-token resume path that this PR changes.
No tests were executed in this read-only review; validation relies on source inspection and the PR-reported focused regression lane.

Maintainer options:

Decide the mitigation before merge
Land the narrow native-token precedence fix after adding redacted live terminal/log output or a short recording showing a stale mirrored Codex session resuming instead of rotating, while keeping the focused regression test.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
Needs contributor-supplied real behavior proof of the stale-token resume path; there is no safe ClawSweeper repair lane for the contributor's local setup.

Security
Cleared: No concrete security or supply-chain concern found in the two-file Codex app-server runtime/test diff.

Review details

Best possible solution:

Land the narrow native-token precedence fix after adding redacted live terminal/log output or a short recording showing a stale mirrored Codex session resuming instead of rotating, while keeping the focused regression test.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main computes max(sessionTokens, nativeTokens), so a 96,000 mirrored total with native last_token_usage of 12,000 crosses the 70,000 resume limit and rotates. I did not execute tests in this read-only review.

Is this the best way to solve the issue?

Yes for the code path: preferring native rollout tokens when present is the narrowest maintainable fix because current app-server code already treats last/current native usage as active context usage. The remaining blocker is proof quality, not the implementation shape.

Label changes:

add rating: 🦪 silver shellfish: Current PR rating is 🦪 silver shellfish because proof is 🦪 silver shellfish, patch quality is 🐚 platinum hermit, and The patch is clean and focused, but merge confidence is capped because real behavior proof does not exercise the changed resume path.
add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Terminal proof shows the patched installed file and a healthy Gateway, but it does not show the actual stale mirrored-token resume behavior after the fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
remove impact:session-state: Current review selected no impact labels.

Label justifications:

P2: The PR fixes a bounded Codex app-server session-state edge case with limited blast radius and clear regression coverage.
rating: 🦪 silver shellfish: Current PR rating is 🦪 silver shellfish because proof is 🦪 silver shellfish, patch quality is 🐚 platinum hermit, and The patch is clean and focused, but merge confidence is capped because real behavior proof does not exercise the changed resume path.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Terminal proof shows the patched installed file and a healthy Gateway, but it does not show the actual stale mirrored-token resume behavior after the fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

What I checked:

Current-main behavior: Current main computes the resume guard with max(sessionTokens, nativeTokens), so a stale mirrored session total can still force a fresh Codex thread even when native last_token_usage is compacted and low. (extensions/codex/src/app-server/run-attempt.ts:790, 8f8638393ef4)
PR source change: The PR changes the guard to nativeTokens ?? sessionTokens, preserving the session-total fallback only when native rollout token usage is unavailable. (extensions/codex/src/app-server/run-attempt.ts:708, 147b9a9fd8c5)
Regression coverage: The added test covers a mirrored total of 96,000 with native last_token_usage of 12,000 and asserts thread/resume is used without thread/start. (extensions/codex/src/app-server/run-attempt.test.ts:6207, 147b9a9fd8c5)
Token-usage contract in current code: Adjacent Codex app-server code already treats last/current token usage fields, including last_token_usage, as current native usage rather than cumulative total usage. (extensions/codex/src/app-server/compact.ts:643, 8f8638393ef4)
Prior provenance: Commit 9a94194 established the same direction for Codex usage accounting by avoiding cumulative Codex usage as context. (extensions/codex/src/app-server/event-projector.ts:82, 9a94194329de)
Proof gate: The PR body includes terminal output showing the patched installed file and a healthy Gateway, but not a real stale mirrored-token Codex session resuming instead of rotating. (147b9a9fd8c5)

Likely related people:

steipete: Peter Steinberger authored the Codex app-server controls and major app-server module/lifecycle refactors, and shortlog shows the most commits in the central app-server files. (role: feature owner and heavy area contributor; confidence: high; commits: 31a0b7bd42a5, 8d72aafdbb8d, 659bcc5e5b59; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts, extensions/codex/src/app-server/event-projector.ts)
cyrusaf: Cyrus Forbes authored the prior merged fix that avoided treating cumulative Codex token usage as current context usage. (role: prior token-usage fix author; confidence: high; commits: 9a94194329de; files: extensions/codex/src/app-server/event-projector.ts, extensions/codex/src/app-server/event-projector.test.ts)
vincentkoc: Vincent Koc recently worked on Codex app-server startup/auth handling and shared run-attempt harness coverage around the same runtime path. (role: recent adjacent Codex app-server contributor; confidence: medium; commits: f1cc8f0cfc7c, 859eb0666282, b5dfeaab4ce3; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8f8638393ef4.

hansolo949 · 2026-05-22T22:24:45Z

@clawsweeper re-review

The Real behavior proof check is passing and the PR body includes live local runtime evidence plus the focused regression lane. Please refresh the durable review verdict against the current proof.

clawsweeper · 2026-05-22T22:24:48Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26314940159
Updated: 2026-05-22T22:32:46.414Z

clawsweeper · 2026-05-22T22:32:37Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

fix(codex): prefer native tokens for resume budget

147b9a9

openclaw-barnacle Bot added extensions: codex size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 17, 2026

clawsweeper Bot added P2 Normal backlog priority with limited blast radius. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. labels May 17, 2026

Steady-ai mentioned this pull request May 27, 2026

[codex] fix Codex budget native compaction trigger #87158

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(codex): prefer native tokens for resume budget#83229

fix(codex): prefer native tokens for resume budget#83229
hansolo949 wants to merge 1 commit into
openclaw:mainfrom
hansolo949:fix/codex-context-budget-native-precedence

hansolo949 commented May 17, 2026

Uh oh!

clawsweeper Bot commented May 17, 2026 •

edited

Loading

Uh oh!

hansolo949 commented May 22, 2026

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hansolo949 commented May 17, 2026

Summary

Real behavior proof

Supplemental validation

Uh oh!

clawsweeper Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hansolo949 commented May 22, 2026

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 17, 2026 •

edited

Loading

clawsweeper Bot commented May 22, 2026 •

edited

Loading