fix(codex): prefer native tokens for resume budget#83229
Conversation
|
Codex review: needs real behavior proof before merge. Latest ClawSweeper review: 2026-05-22 22:30 UTC / May 22, 2026, 6:30 PM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. source-reproducible: current main computes max(sessionTokens, nativeTokens), so a 96,000 mirrored total with native last_token_usage of 12,000 crosses the 70,000 resume limit and rotates. I did not execute tests in this read-only review. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the narrow native-token precedence fix after adding redacted live terminal/log output or a short recording showing a stale mirrored Codex session resuming instead of rotating, while keeping the focused regression test. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: current main computes max(sessionTokens, nativeTokens), so a 96,000 mirrored total with native last_token_usage of 12,000 crosses the 70,000 resume limit and rotates. I did not execute tests in this read-only review. Is this the best way to solve the issue? Yes for the code path: preferring native rollout tokens when present is the narrowest maintainable fix because current app-server code already treats last/current native usage as active context usage. The remaining blocker is proof quality, not the implementation shape. Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 8f8638393ef4. |
|
@clawsweeper re-review The Real behavior proof check is passing and the PR body includes live local runtime evidence plus the focused regression lane. Please refresh the durable review verdict against the current proof. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
Summary
This is the narrow follow-up for the remaining Codex app-server resume-budget edge case. Upstream
mainalready has the broader native rollout guard and dynamic tool-result cap; this patch only changes token precedence when both native rollout usage and mirrored OpenClaw session totals are present.Real behavior proof
Behavior or issue addressed: OpenClaw can keep a high mirrored session token total after Codex app-server compaction. Current
mainreads nativelast_token_usage, but still evaluates the resume guard withmax(sessionTokens, nativeTokens), so a stale mirrored total can rotate a healthy compacted native thread. This patch prefersnativeTokens ?? sessionTokens.Real environment tested: Han local OpenClaw install on macOS, OpenClaw
2026.5.16-beta.3, Gateway running against the main agent with WebChat/Codex app-server bindings.Exact steps or command run after this patch: Applied the same one-line runtime patch locally, restarted/checked OpenClaw, verified the installed runtime contains
const tokenCount = nativeTokens ?? sessionTokens;, and ran the focused app-server regression lane in this rebased PR worktree.Evidence after fix: Live terminal output from the local OpenClaw setup after the patch:
Observed result after fix: The local installed runtime now uses native Codex rollout token usage first, so a compacted native thread is not forced to rotate solely because OpenClaw mirrored session totals are stale. Gateway stayed healthy after the patch. The rebased PR regression also passed:
Test Files 1 passed (1); Tests 7 passed | 151 skipped (158).What was not tested: Full repository test suite and a live upstream CI WebChat session. The live local Gateway/runtime check and focused app-server regression were tested.
Supplemental validation
Supplemental results: oxfmt passed, git diff whitespace passed, focused Vitest passed.