fix(codex): make native thread token guard configurable#86069
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 26, 2026, 1:27 PM ET / 17:27 UTC. Summary PR surface: Source +63, Tests +257. Total +320 across 3 files. Reproducibility: no. high-confidence real reproduction is in hand. Source and tests show current main rotates at 70k and the PR simulates the new guard behavior, but the latest no-config rewrite was not proven with a live Codex app-server conversation or equivalent runtime probe. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land a retry-safe internal fuse only after maintainers accept the new guard policy: valid thread-bootstrap bindings should stay warm, but actual model-window or overflow retries must either project context into the fresh thread or avoid writing a bootstrapped binding. Do we have a high-confidence way to reproduce the issue? No high-confidence real reproduction is in hand. Source and tests show current main rotates at 70k and the PR simulates the new guard behavior, but the latest no-config rewrite was not proven with a live Codex app-server conversation or equivalent runtime probe. Is this the best way to solve the issue? No, not yet. Removing the config option fits the maintainer direction, but the thread-bootstrap bypass needs retry-safe projection behavior or an explicit maintainer decision to accept the session-state risk. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against a5eee8f1c678. Label changesLabel justifications:
Evidence reviewedPR surface: Source +63, Tests +257. Total +320 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
@clawsweeper re-review Fixed the P3 docs finding in Validation from Results: both passed. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@vincentkoc tagging you here for context and co-author credit. This PR is part of the native-thread stabilization work: when threads are not set up correctly, cache continuity and compaction can attach to the wrong thread or break across resumes. This change fixes that path so cached context and compaction state stay attached to the intended native thread. I also added a no-code co-author commit on this branch with |
|
I really do not want yet another config option. |
Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
6dac982 to
0177987
Compare
|
Behavior addressed
Real environment tested
Exact steps or command run after this patch
Evidence after fix
Observed result after fix
What was not tested
|
Summary
This is a stacked follow-up to #85978 for the hard-coded Codex app-server native-thread token reuse guard.
It makes the proactive native-token guard configurable instead of treating
70000as the only possible policy:agents.defaults.compaction.maxActiveTranscriptTokens.70000when unset, preserving existing behavior."120k"/"1.5k".0to disable only this proactive token guard.Stacking / Review Order
This branch is intentionally stacked on #85978 (
codex/native-thread-reuse-budget).Recommended order:
thread_bootstrapbinding could be cleared by the startup size guard before the context-engine projection decision had a chance to preserve it.If this PR is reviewed before #85978 lands, the intended new commit here is:
Why This Is Needed
The existing guard is not the model context window. It is a local Codex app-server native-thread reuse guard. Today it is hard-coded to
70000tokens, and when it fires OpenClaw clears the saved native thread binding before startup:That is a reasonable safety valve for legacy/stale bindings, but it is too blunt for long-running Codex sessions because the native token count can include the large bootstrap-loaded context already paid into the native thread.
Scenario:
sequenceDiagram participant OC as OpenClaw participant CE as Context Engine / Bootstrap participant CX as Codex app-server native thread OC->>CE: assemble bootstrap/context CE-->>OC: 80k-120k effective bootstrap/context OC->>CX: thread/start with developer instructions + projected context CX-->>OC: rollout token_count over 70k OC->>CX: next turn startup checks saved binding alt legacy hard guard OC->>OC: clear binding at 70k OC->>CX: cold thread/start again else semantic thread_bootstrap from #85978 OC->>OC: preserve matching semantic binding OC->>CX: thread/resume warm path endThe bootstrap question is the crux: for legacy/non-context-engine native reuse, the guard can effectively see the native thread after it has already absorbed large startup/bootstrap instructions. For active context-engine
thread_bootstrap, #85978 prevents that bootstrap cost from automatically invalidating the native binding on the next turn. This PR covers the remaining policy question by letting operators set the guard to match their deployment instead of forcing every installation through70000.What This Does Not Solve
This does not implement the larger #86023 architecture items:
Those should stay separate because they touch lifecycle design, diagnostics, and context-engine contracts rather than just the hard-coded native guard policy.
Real behavior proof
70000as the only possible token policy. A thread binding with86000recorded session tokens is preserved when configured to"120k"and cleared when configured to"50k"./Volumes/LEXAR/repos/worktrees/openclaw-codex-semantic-reuse-guard, using the actual patched runtime modules throughnode --import tsx.86000-token recorded native/session state preserves the existing binding under"120k"and clears it under"50k", matching the configurable guard behavior added by this PR.Validation
Focused local validation from
/Volumes/LEXAR/repos/worktrees/openclaw-codex-semantic-reuse-guard:Results: all passed locally.
I also ran four read-only adversarial review agents over the diff. Two returned zero >=95% findings; two found test/efficiency gaps that were fixed before this PR was opened.