Fix Codex native thread reuse for context-engine bootstraps#85978
Fix Codex native thread reuse for context-engine bootstraps#85978100yenadmin wants to merge 5 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 4:20 PM ET / 20:20 UTC. Summary PR surface: Source +18, Tests +195. Total +213 across 2 files. Reproducibility: yes. at source level: current main applies the startup token/byte guard before the context-engine projection decision, so an oversized saved Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the narrow ordering fix after current-head CI and maintainer review, while leaving broader native-thread cache ownership, configurable guard policy, diagnostics, and compaction preservation to the linked follow-up stack. Do we have a high-confidence way to reproduce the issue? Yes, at source level: current main applies the startup token/byte guard before the context-engine projection decision, so an oversized saved Is this the best way to solve the issue? Yes, the proposed fix is the narrow maintainable path: defer only for active context-engine bootstrap bindings and let the existing engine/policy/projection/tool compatibility gate decide reuse. The stale/no-active-engine case remains on the existing cold-start path. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against a98660eebd2a. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +18, Tests +195. Total +213 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 💎 rare Neon Crabkin Hatch commandComment Hatchability rules:
Rarity: 💎 rare. What is this egg doing here?
|
|
@clawsweeper re-review Addressed the P2 test harness finding in 16be33f by making Results: context-engine 19/19 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Addressed the P2 stale/no-active context-engine finding in 1f6d544 by gating the startup size-guard deferral on a current active context engine. Oversized saved Re-ran from Results: context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Opened follow-up architecture issue #86023 for the broader long-running Codex session design. This PR remains the narrow correctness fix: preserve valid context-engine |
|
@clawsweeper re-review Follow-up pushed in 4e7ed27 after CI surfaced Additional local validation from Results: extension test typecheck passed; context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
4e7ed27 to
98d1082
Compare
|
@clawsweeper re-review Rebased onto current upstream/main after GitHub reported conflicts. The conflict was only the test-helper type signature that upstream had already fixed with Re-ran from Results: extension test typecheck passed; context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Add Vincent Koc as a co-author for the PR context and review trail. Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
|
@vincentkoc tagging you here for context and co-author credit. This PR is part of the native-thread stabilization work: when threads are not set up correctly, cache continuity and compaction can attach to the wrong thread or break across resumes. This change fixes that path so cached context and compaction state stay attached to the intended native thread. I also added a no-code co-author commit on this branch with |
|
Thanks Eva. This fix is now covered on main by 7a14741, which preserves context-engine thread-bootstrap reuse while keeping the native byte guard and moving the token guard to Codex's reported model context window with a high recovery fallback. Proof recorded on the landed commit:
The PR branch now conflicts with current main because the same area has already landed there, so I am closing this as landed via that commit. |
Fixes #85975.
Summary
thread_bootstrapCodex native thread bindings through the startup token/byte transcript guard so large bootstrap turns do not force coldthread/starton every later turn.Why this matters
This is not about the model's maximum context window.
CODEX_APP_SERVER_NATIVE_THREAD_MAX_TOKENS = 70_000is a local OpenClaw/Codex app-server active-thread reuse guard. It can rotate a native thread much earlier than the model's real context limit.For example, the current repo metadata around
gpt-5.5includes much larger limits than 70k: Copilot fallback metadata uses a400_000context window and related legacy Codex metadata recognizes a272_000prompt-token shape with128_000max output. So a 70k native-thread guard can invalidate the warm thread while the selected model still has plenty of context headroom.That distinction matters because
thread_bootstrapis the token-efficiency contract:Even when provider-side prompt caching helps with identical prefixes, the cold path still loses Codex native-thread reuse and creates avoidable gateway/app-server work: context assembly, prompt rendering, tool/app setup, rollout scanning, and a fresh native thread startup.
Thread/cache flow
flowchart TD A[New Discord/user turn] --> B[OpenClaw loads saved Codex native thread binding] B --> C{Saved binding has contextEngine.projection.mode = thread_bootstrap?} C -- yes --> D[This PR: defer startup token/byte guard] D --> E[Context engine assembles current view] E --> F{Engine, policy, epoch, fingerprint, and tools still match?} F -- yes --> G[thread/resume warm native Codex thread] G --> H[turn/start avoids context-engine history replay] F -- no --> I[Clear stale binding and start fresh thread] C -- no --> J[Legacy/workspace-bootstrap path still uses 70k startup guard] J --> K{nativeTokens or sessionTokens >= 70k?} K -- yes --> L[Clear binding; thread/start cold path] K -- no --> G10-turn scenario
Illustrative numbers, not exact billing math:
thread_bootstrapbehavior over 10 turns: one large context-engine bootstrap, then 9 warm resumes. Replayed bootstrap pressure is roughly90k + 9 * 2k = 108kplus model-visible continuation state and any separate workspace/turn-scoped context surfaces.10 * 90k = 900kof repeated bootstrap pressure before counting deltas.The important part is the shape, not the exact multiplier: once the native thread is rotated every turn, the system stops amortizing the bootstrap.
What this PR fixes
Current startup order was:
thread_bootstrapbinding is still compatible.That means an 86k-token bootstrap rollout could delete a still-valid
thread_bootstrapbinding before the code reached the compatibility check that would have allowed reuse.This PR changes only that ordering for bindings that already declare
contextEngine.projection.mode === "thread_bootstrap"and when a current active context engine can re-evaluate the saved binding:thread_bootstrapand the current run has an active context engine, startup defers the proactive byte/token guard.startOrResumeThreadstill clears the binding if the current context-engine policy/projection/tool binding is incompatible.So the PR fixes a concrete correctness bug: compatible context-engine bootstrap threads should be allowed to reach the existing compatibility gate instead of being preemptively deleted by a generic native transcript-size guard.
What this PR does not fully solve
A tester reported that after locally merging this PR, their Discord channel was still slow and logged:
They also observed the cold path re-injecting/truncating large bootstrap files such as
AGENTS.md,USER.md, andMEMORY.md.Based on the code, that exact warning after this PR means the startup exemption did not apply for that session. The most likely explanations are:
contextEngine.projection.mode = "thread_bootstrap"marker, so it is on the legacy/workspace-bootstrap path;per_turn, no valid epoch, or no projection binding;In other words: this PR removes one real source of churn, but it is not the full architecture fix for all oversized Discord/Codex sessions. It makes the safe
thread_bootstrappath behave like its contract says. Sessions that never enter or never retain that path can still cold-start every turn.Follow-up architecture
The broader architecture should probably move toward:
AGENTS.md/USER.md/MEMORY.mdreinjection when the same files are unchanged;last_token_usagevstotal_token_usagevssessions.json), and which bootstrap files dominate rendered context;Useful logs to compare on a slow channel after this PR:
Real behavior proof
thread_bootstrapnative thread just because the bootstrap rollout exceeded the native token or byte guard./Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse, using Lexar scratch under/Volumes/LEXAR/Codex/openclaw-codex-native-thread-proof-20260524-151049.node --import tsx --input-type=moduleprobe against the patched productiontesting.rotateOversizedCodexAppServerStartupBindingexport. The probe wrote real Codex app-server binding,sessions.json, and rollout files for token-oversized and byte-oversizedthread_bootstrapcases, then read the saved binding back from disk.thread-bootstrappedbinding withprojection=thread_bootstrap; the startup guard did not clear the binding.Verification
thread_bootstrapbinding whose native rollout reports 86k latest tokens now usesthread/resumeand does not replay assembled bootstrap context intoturn/start./Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse.pnpm tsgo:extensions:test: passed.run-attempt.context-engine.test.ts: 20/20 passed, including the stale/no-active context-engine regression.run-attempt.test.tsnative guard slice: 3 passed, 215 skipped.Full repository-wide suites were intentionally left to GitHub CI per the local-resource policy.