fix(codex): preserve semantic native threads across compaction#86160
fix(codex): preserve semantic native threads across compaction#86160100yenadmin wants to merge 21 commits into
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 7:16 AM ET / 11:16 UTC. Summary PR surface: Source +1743, Tests +2123, Docs +81, Generated 0. Total +3947 across 30 files. Reproducibility: no. current-head live reproduction was established in this review. The source path and linked issue/PR evidence make the failure mode source-reproducible, but the current PR head still needs after-fix proof. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Refresh the PR body with current-head real behavior proof, then have maintainers review the semantic preservation policy, config/default surface, and diagnostic privacy boundary before landing the stack in a coherent order. Do we have a high-confidence way to reproduce the issue? No current-head live reproduction was established in this review. The source path and linked issue/PR evidence make the failure mode source-reproducible, but the current PR head still needs after-fix proof. Is this the best way to solve the issue? Unclear until current-head proof and maintainer policy review catch up. Preserving compatible AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 545ad7f256e2. Label changesLabel justifications:
Evidence reviewedPR surface: Source +1743, Tests +2123, Docs +81, Generated 0. Total +3947 across 30 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
66906a1 to
996ebfe
Compare
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
@clawsweeper please re-review this PR against the latest head |
|
/review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
996ebfe to
2ead386
Compare
2ead386 to
e496a20
Compare
a170a7e to
1055c44
Compare
Add agents.defaults.compaction.maxActiveTranscriptTokens for Codex app-server native thread reuse. The default remains 70000 tokens for existing deployments; positive numeric or shorthand token-count values override it, and 0 disables only the proactive token guard while preserving byte limits and semantic binding invalidation. Also skip rollout directory scans when both native guards are disabled, document the setting, regenerate the config baseline hash, and cover rollout/session token sources plus byte-limit preservation in focused tests.
Add Vincent Koc as a co-author for the PR context and review trail. Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Add Vincent Koc as a co-author for the PR context and review trail. Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
1055c44 to
c99a46c
Compare
Summary
This is PR #4 in the Codex native-thread stack, after #85978, #86069, and #86094. It turns the diagnostics foundation into the long-term cache/lifecycle fix for compatible context-engine
thread_bootstrapbindings:thread_bootstrapbindings across successful context-engine-owned compaction;turn/startso the fresh thread receives projected context;context-engine-compaction-preserved-bindinglifecycle diagnostics and docs explaining the preserved vs invalidated paths.Why this is not “raise 70k to 258k/272k”
The default
70000native reuse guard remains a legacy/stale-binding safety rail. The healthy Codex + LCM path is semantic ownership:contextProjection.mode = "thread_bootstrap"with a stable epoch/fingerprint.The important distinction is model capacity vs warm-thread lifecycle policy:
70000272000400000With a 30% post-compaction target, even a compacted transcript can land above the guard:
So repeatedly applying a hard 70k warm-thread rotation rule to otherwise compatible Codex native threads makes long-running sessions take the cold path right after compaction. This PR keeps the 70k guard for legacy/non-semantic bindings, but lets semantically-owned
thread_bootstrapbindings survive compaction/rollover.Local bootstrap size scan
I ran a bounded local scan over named bootstrap candidates (
AGENTS.md,USER.md,MEMORY.md,SOUL.md,IDENTITY.md,TOOLS.md,BOOTSTRAP.md) under/Volumes/LEXAR/repos,/Volumes/LEXAR/Codex,~/.openclaw, and~/.codex, usingceil(bytes / 4)as a conservative token estimate.~/.codex/memories/MEMORY.md)Interpretation: the bug is not “every AGENTS.md is over 70k.” The failure mode is total rendered pressure: compacted history + context-engine projection + workspace bootstrap/developer instructions + native rollout growth. Once that total has created a warm native thread, the efficient Codex path is to preserve the semantically valid native thread rather than reproject the large stable payload every turn.
Flow
flowchart TD A["Long-running Codex session"] --> B{"Saved native binding?"} B -- "Legacy / ownerless / non-bootstrap" --> C{"session/native tokens >= guard?"} C -- "yes" --> D["Rotate cold: thread/start + reproject context"] C -- "no" --> E["Try normal resume"] B -- "context-engine thread_bootstrap" --> F{"Engine id + policy + epoch/fingerprint + tools match?"} F -- "no" --> D F -- "yes" --> G["Resume warm native thread"] G --> H["Send current prompt only"] H --> I{"Context-engine compaction?"} I -- "same file, identity stable" --> J["Preserve binding in place"] I -- "successor rollover, identity stable" --> K["Copy binding to successor; clear archived original"] I -- "identity changes or app-server rejects" --> D J --> G K --> GReal behavior proof
thread_bootstrapnative bindings are preserved across successful context-engine compaction and successor session-file rollover, while legacy/non-bootstrap, mismatch, and app-server rejection paths still rotate/clear./Volumes/LEXAR/repos/worktrees/openclaw-codex-semantic-reuse-guard, branchcodex/semantic-native-thread-preservation, current head8b405b4d5c59432b4f3218b570609d3d43f5a4ad. The proof used temp session files under/Volumes/LEXAR/Codex/openclaw-pr-86160-proof-8b405b4d5cand the actual exportedreconcileContextEngineCompactedCodexBinding(...),writeCodexAppServerBinding(...),readCodexAppServerBinding(...),clearCodexAppServerBinding(...), andstartOrResumeThread(...)helpers from this checkout.pnpm exec tsxproof probe, not Vitest/tsgo/build. The probe wrote matchingthread_bootstrapbindings, simulated a compactor clearing the sidecar before same-file compaction, simulated successor rollover copy/archived clear, and seeded a legacy raw user-MCP binding to prove the mismatch path rotates throughthread/startand persists a hashed comparison token. I also verified the patched worktree identity and whitespace state withpwd,git rev-parse HEAD,git branch --show-current,git status --porcelain, andgit diff --check HEAD^ HEAD.git status --porcelainandgit diff --check HEAD^ HEADproduced no output.preserved: trueand restoredthread-same-file-warmafter the sidecar was cleared. Successor rollover returnedpreserved: true, cleared the archived/original binding (archivedBindingPresent: false), and copiedthread-rollover-warmwith the samethread_bootstrapepoch/fingerprint to the successor session file. The MCP mismatch path did not resume the stale thread; it calledthread/start, savedthread-mcp-new, persisted asha256:comparison fingerprint, and did not leak the seeded MCP email/path into the saved binding.codex app-server native transcript exceeded active token limit; starting a fresh threadat native token counts above the 70k reuse guard; issue Codex long-running sessions should use semantic thread/bootstrap cache ownership #86023 and this PR body include the budget math and local bootstrap-size scan explaining that failure mode.Validation
Local validation intentionally skipped heavy suites per AGENTS.md / user instruction. GitHub Actions is the validation gate.
Local:
pnpm exec tsxlifecycle proof probe, as captured abovegit diff --check HEAD^ HEADGitHub Actions:
8b405b4d5c59432b4f3218b570609d3d43f5a4adis green, including the previously failing logging support-export slice after the stability-bundlemodeparser fix and the rerun after the unrelated model-catalog timeout.Not run locally:
pnpm tsgo:extensions:testStack / Review Notes
This PR is stacked on #86094. The diff to review is the top commit:
8b405b4d5c fix(codex): preserve semantic native threads across compactionRefs #86023.