Skip to content

Fix Codex native thread reuse for context-engine bootstraps#85978

Closed
100yenadmin wants to merge 5 commits into
openclaw:mainfrom
100yenadmin:codex/native-thread-reuse-budget
Closed

Fix Codex native thread reuse for context-engine bootstraps#85978
100yenadmin wants to merge 5 commits into
openclaw:mainfrom
100yenadmin:codex/native-thread-reuse-budget

Conversation

@100yenadmin

@100yenadmin 100yenadmin commented May 24, 2026

Copy link
Copy Markdown
Contributor

Fixes #85975.

Summary

  • Preserve context-engine thread_bootstrap Codex native thread bindings through the startup token/byte transcript guard so large bootstrap turns do not force cold thread/start on every later turn.
  • Keep native reuse bounded by the existing context-engine compatibility checks: engine/policy, projection epoch/fingerprint, dynamic tool fingerprint, and compaction-driven binding invalidation.
  • Add regression coverage for token- and byte-oversized bootstrap transcripts resuming the warmed native thread while avoiding context-engine assembled-history replay.

Why this matters

This is not about the model's maximum context window. CODEX_APP_SERVER_NATIVE_THREAD_MAX_TOKENS = 70_000 is a local OpenClaw/Codex app-server active-thread reuse guard. It can rotate a native thread much earlier than the model's real context limit.

For example, the current repo metadata around gpt-5.5 includes much larger limits than 70k: Copilot fallback metadata uses a 400_000 context window and related legacy Codex metadata recognizes a 272_000 prompt-token shape with 128_000 max output. So a 70k native-thread guard can invalidate the warm thread while the selected model still has plenty of context headroom.

That distinction matters because thread_bootstrap is the token-efficiency contract:

  • First turn: OpenClaw assembles expensive context and injects it into a new Codex native thread.
  • Later turns, best case: OpenClaw resumes the same native thread and does not replay the context-engine assembled history; the bootstrapped context is already inside the native thread. Other workspace/bootstrap surfaces may still contribute turn instructions or prompt context separately.
  • Broken case: OpenClaw clears the native thread binding before it checks the context-engine bootstrap binding, so every turn cold-starts and re-injects/reprojects the large bootstrap payload.

Even when provider-side prompt caching helps with identical prefixes, the cold path still loses Codex native-thread reuse and creates avoidable gateway/app-server work: context assembly, prompt rendering, tool/app setup, rollout scanning, and a fresh native thread startup.

Thread/cache flow

flowchart TD
  A[New Discord/user turn] --> B[OpenClaw loads saved Codex native thread binding]
  B --> C{Saved binding has contextEngine.projection.mode = thread_bootstrap?}
  C -- yes --> D[This PR: defer startup token/byte guard]
  D --> E[Context engine assembles current view]
  E --> F{Engine, policy, epoch, fingerprint, and tools still match?}
  F -- yes --> G[thread/resume warm native Codex thread]
  G --> H[turn/start avoids context-engine history replay]
  F -- no --> I[Clear stale binding and start fresh thread]
  C -- no --> J[Legacy/workspace-bootstrap path still uses 70k startup guard]
  J --> K{nativeTokens or sessionTokens >= 70k?}
  K -- yes --> L[Clear binding; thread/start cold path]
  K -- no --> G
Loading

10-turn scenario

Illustrative numbers, not exact billing math:

  • A context-engine bootstrap payload renders to 90k tokens.
  • Each later user turn adds a small 2k-token prompt/delta.
  • Desired thread_bootstrap behavior over 10 turns: one large context-engine bootstrap, then 9 warm resumes. Replayed bootstrap pressure is roughly 90k + 9 * 2k = 108k plus model-visible continuation state and any separate workspace/turn-scoped context surfaces.
  • Broken startup-guard behavior: the first 90k turn records native usage over the 70k guard. Each later startup clears the binding, starts a fresh native thread, and replays/rebuilds the large bootstrap again. That can become roughly 10 * 90k = 900k of repeated bootstrap pressure before counting deltas.

The important part is the shape, not the exact multiplier: once the native thread is rotated every turn, the system stops amortizing the bootstrap.

What this PR fixes

Current startup order was:

  1. Read saved Codex app-server thread binding.
  2. Apply byte/token native transcript guard.
  3. Only later ask the context engine whether the saved thread_bootstrap binding is still compatible.

That means an 86k-token bootstrap rollout could delete a still-valid thread_bootstrap binding before the code reached the compatibility check that would have allowed reuse.

This PR changes only that ordering for bindings that already declare contextEngine.projection.mode === "thread_bootstrap" and when a current active context engine can re-evaluate the saved binding:

  • If the saved binding is a context-engine thread_bootstrap and the current run has an active context engine, startup defers the proactive byte/token guard.
  • startOrResumeThread still clears the binding if the current context-engine policy/projection/tool binding is incompatible.
  • If the saved binding is stale and no context engine is active for the current turn, the startup size guard still clears the oversized native thread so prompt construction projects mirrored history into the fresh thread.
  • Non-bootstrap native sessions still use the existing byte/token startup guard.

So the PR fixes a concrete correctness bug: compatible context-engine bootstrap threads should be allowed to reach the existing compatibility gate instead of being preemptively deleted by a generic native transcript-size guard.

What this PR does not fully solve

A tester reported that after locally merging this PR, their Discord channel was still slow and logged:

nativeTokens=116268, max 70000
codex app-server native transcript exceeded active token limit; starting a fresh thread

They also observed the cold path re-injecting/truncating large bootstrap files such as AGENTS.md, USER.md, and MEMORY.md.

Based on the code, that exact warning after this PR means the startup exemption did not apply for that session. The most likely explanations are:

  • the saved binding has no contextEngine.projection.mode = "thread_bootstrap" marker, so it is on the legacy/workspace-bootstrap path;
  • the context engine is returning per_turn, no valid epoch, or no projection binding;
  • the projection epoch/fingerprint, policy fingerprint, dynamic tool fingerprint, MCP/app config, or environment selection changes every turn, so lifecycle compatibility still starts a fresh thread;
  • the local test build did not include the latest PR head; or
  • the session is a non-context-engine native session, where the old 70k guard intentionally still applies.

In other words: this PR removes one real source of churn, but it is not the full architecture fix for all oversized Discord/Codex sessions. It makes the safe thread_bootstrap path behave like its contract says. Sessions that never enter or never retain that path can still cold-start every turn.

Follow-up architecture

The broader architecture should probably move toward:

  • treating context/bootstrap ownership as the lifecycle authority: rotate on semantic epoch/fingerprint/tool/config changes, not only on a hard-coded 70k native usage number;
  • making the native reuse guard model/config aware instead of a fixed local threshold that can be lower than common bootstrap payloads;
  • bringing workspace bootstrap files under an explicit cache/binding contract, or at least avoiding full AGENTS.md / USER.md / MEMORY.md reinjection when the same files are unchanged;
  • logging enough provenance to diagnose every rotation: binding mode, projection epoch/fingerprint, projection decision reason, token source (last_token_usage vs total_token_usage vs sessions.json), and which bootstrap files dominate rendered context;
  • distinguishing "provider/model context overflow" from "OpenClaw native warm-thread reuse guard" in warnings and status output.

Useful logs to compare on a slow channel after this PR:

codex app-server deferring native transcript size guard for context-engine thread bootstrap
codex app-server context-engine projection decision
codex app-server wrote context-engine thread binding
codex app-server context-engine binding changed; starting a new thread
codex app-server native transcript exceeded active token limit; starting a fresh thread

Real behavior proof

  • Behavior or issue addressed: Codex app-server startup binding rotation should not clear a still-compatible context-engine thread_bootstrap native thread just because the bootstrap rollout exceeded the native token or byte guard.
  • Real environment tested: local patched OpenClaw checkout at /Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse, using Lexar scratch under /Volumes/LEXAR/Codex/openclaw-codex-native-thread-proof-20260524-151049.
  • Exact steps or command run after this patch: ran a live node --import tsx --input-type=module probe against the patched production testing.rotateOversizedCodexAppServerStartupBinding export. The probe wrote real Codex app-server binding, sessions.json, and rollout files for token-oversized and byte-oversized thread_bootstrap cases, then read the saved binding back from disk.
  • Evidence after fix: terminal output from the local OpenClaw checkout:
/Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse
REAL_BEHAVIOR_PROOF case=token-over-86k returnedThreadId=thread-bootstrapped savedThreadId=thread-bootstrapped projection=thread_bootstrap
REAL_BEHAVIOR_PROOF case=byte-over-2k returnedThreadId=thread-bootstrapped savedThreadId=thread-bootstrapped projection=thread_bootstrap
REAL_BEHAVIOR_PROOF proofRoot=/Volumes/LEXAR/Codex/openclaw-codex-native-thread-proof-20260524-151049
  • Observed result after fix: both oversized native transcript cases returned and persisted the warmed thread-bootstrapped binding with projection=thread_bootstrap; the startup guard did not clear the binding.
  • What was not tested: no live Discord channel or real Codex app-server network session was exercised from this PR branch; full broad validation is left to GitHub CI.

Verification

  • Behavior addressed: a saved thread_bootstrap binding whose native rollout reports 86k latest tokens now uses thread/resume and does not replay assembled bootstrap context into turn/start.
  • Real environment tested: local Lexar OpenClaw worktree at /Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse.
  • Exact commands run after this patch:
pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts extensions/codex/src/app-server/run-attempt.test.ts
pnpm tsgo:extensions:test
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
git diff --check
  • Evidence after fix:
    • pnpm tsgo:extensions:test: passed.
    • run-attempt.context-engine.test.ts: 20/20 passed, including the stale/no-active context-engine regression.
    • targeted run-attempt.test.ts native guard slice: 3 passed, 215 skipped.
    • formatting and whitespace checks passed.
  • Pi/runtime risk review: parallel adversarial review found this change limited to Codex app-server startup binding rotation; Pi embedded-runner/shared compaction semantics are not changed.

Full repository-wide suites were intentionally left to GitHub CI per the local-resource policy.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 4:20 PM ET / 20:20 UTC.

Summary
The PR defers the Codex app-server startup token/byte transcript guard for active context-engine thread_bootstrap bindings and adds regression tests for oversized bootstrap reuse plus stale/no-active-engine rotation.

PR surface: Source +18, Tests +195. Total +213 across 2 files.

Reproducibility: yes. at source level: current main applies the startup token/byte guard before the context-engine projection decision, so an oversized saved thread_bootstrap binding can be cleared before the existing semantic compatibility gate runs. I did not run tests in this read-only review, but the code path and added regression cases are clear.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Keep current-head CI green on the merge ref before landing.

Risk before merge

  • This intentionally lets oversized Codex native threads survive the startup size guard for active context-engine bootstrap bindings; if the semantic compatibility checks are incomplete, a stale native context could be resumed instead of rotated. The patch uses the existing engine/policy/projection/tool checks and adds stale/no-active-engine coverage, so this is a maintainer merge-risk consideration rather than a blocking finding.

Maintainer options:

  1. Land after current-head CI (recommended)
    Accept the session-state lifecycle risk once the merge ref stays green and maintainers are comfortable with the existing semantic compatibility checks as the owner of bootstrap reuse.
  2. Request live long-session proof
    Ask for a real Codex long-running channel or gateway replay if maintainers want transport-level latency proof before changing native thread reuse behavior.
  3. Defer to the larger lifecycle stack
    Pause this PR only if maintainers decide the startup guard should not be narrowed separately from the broader semantic native-thread ownership work.

Next step before merge
No automated repair is needed because this review found no actionable patch defect; the remaining action is normal maintainer review and CI gating.

Security
Cleared: The diff only changes Codex app-server runtime/test logic and does not add dependencies, workflows, permissions, secret handling, or new external code execution paths.

Review details

Best possible solution:

Land the narrow ordering fix after current-head CI and maintainer review, while leaving broader native-thread cache ownership, configurable guard policy, diagnostics, and compaction preservation to the linked follow-up stack.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main applies the startup token/byte guard before the context-engine projection decision, so an oversized saved thread_bootstrap binding can be cleared before the existing semantic compatibility gate runs. I did not run tests in this read-only review, but the code path and added regression cases are clear.

Is this the best way to solve the issue?

Yes, the proposed fix is the narrow maintainable path: defer only for active context-engine bootstrap bindings and let the existing engine/policy/projection/tool compatibility gate decide reuse. The stale/no-active-engine case remains on the existing cold-start path.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a98660eebd2a.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes terminal proof from a patched checkout exercising the production binding-rotation helper with real binding, session, and rollout files, showing token- and byte-oversized thread_bootstrap bindings preserved after the fix.

Label justifications:

  • P2: The PR fixes a normal-priority Codex session-state regression with limited surface area and focused tests.
  • merge-risk: 🚨 session-state: Merging changes when an oversized Codex native thread binding is preserved versus cleared, which can affect context/session reuse semantics.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes terminal proof from a patched checkout exercising the production binding-rotation helper with real binding, session, and rollout files, showing token- and byte-oversized thread_bootstrap bindings preserved after the fix.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes terminal proof from a patched checkout exercising the production binding-rotation helper with real binding, session, and rollout files, showing token- and byte-oversized thread_bootstrap bindings preserved after the fix.
Evidence reviewed

PR surface:

Source +18, Tests +195. Total +213 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 18 0 +18
Tests 1 197 2 +195
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 215 2 +213

What I checked:

  • Repository policy: Read the root and scoped extensions policy; the review applied OpenClaw guidance for plugin boundaries, session-state compatibility, proof, and merge-risk handling. (AGENTS.md:1, a98660eebd2a)
  • Current main startup ordering: Current main reads a saved Codex app-server binding and applies the byte/token startup guard inside rotateOversizedCodexAppServerStartupBinding before context-engine projection compatibility is evaluated. (extensions/codex/src/app-server/run-attempt.ts:835, a98660eebd2a)
  • Existing compatibility gate: startOrResumeThread already clears a saved binding when the current context-engine binding or projection is missing or incompatible, which is the semantic gate this PR lets valid bootstraps reach. (extensions/codex/src/app-server/thread-lifecycle.ts:137, a98660eebd2a)
  • PR runtime change: The PR adds a stored thread_bootstrap projection check and defers the startup size guard only when a current active context engine is present, then passes that active-engine fact from runCodexAppServerAttempt. (extensions/codex/src/app-server/run-attempt.ts:856, 1b9bac83760b)
  • Regression coverage: The PR adds oversized token and byte cases proving matching thread_bootstrap bindings resume without replaying assembled context, plus a stale/no-active-engine case that still starts fresh and projects mirrored history. (extensions/codex/src/app-server/run-attempt.context-engine.test.ts:545, 1b9bac83760b)
  • Live PR state: The live PR is open, unmerged, mergeable clean, and currently points at head 1b9bac83760b41e8bb053101b67a17167178a0f0. (1b9bac83760b)

Likely related people:

  • hansolo949: Authored the merged context-budget guard PR that added the 70k native rollout rotation behavior this PR narrows for context-engine bootstraps. (role: introduced related guard behavior; confidence: high; commits: 084318b8c461; files: extensions/codex/src/app-server/run-attempt.ts)
  • steipete: Merged the prior context-budget guard PR and appears in recent Codex app-server history around the affected file. (role: merger of related behavior; confidence: medium; commits: 084318b8c461, 1b68dbe95ad5; files: extensions/codex/src/app-server/run-attempt.ts)
  • Vincent Koc: Local blame for the relevant current-main startup guard and thread-lifecycle lines resolves to the latest grafted current-main commit by this contributor; the PR discussion also tags them for context credit. (role: recent current-main area contributor; confidence: medium; commits: b0c8a4d11ddd; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
  • pashpashpash: Recent merged work touched Codex compaction/session behavior adjacent to this native-thread lifecycle surface. (role: recent adjacent contributor; confidence: low; commits: dd47e479aedb; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 💎 rare Neon Crabkin

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 💎 rare.
Trait: purrs at green checks.
Image traits: location review cove; accessory proof snapshot camera; palette amber, ink, and glacier blue; mood determined; pose balancing on a branch marker; shell starlit enamel shell; lighting subtle sparkle highlights; background little resolved-comment flags.
Share on X: post this hatch
Copy: My PR egg hatched a 💎 rare Neon Crabkin in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@100yenadmin

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Addressed the P2 test harness finding in 16be33f by making completeTurn accept the active thread id and completing the resumed thread-bootstrapped thread in the new regression. Re-ran:

pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
git diff --check

Results: context-engine 19/19 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed.

@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 24, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 24, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Addressed the P2 stale/no-active context-engine finding in 1f6d544 by gating the startup size-guard deferral on a current active context engine. Oversized saved thread_bootstrap bindings without an active engine now use the existing native size guard, start a fresh thread, and project mirrored history into turn/start. Added a regression covering that stale binding path.

Re-ran from /Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse:

pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
git diff --check

Results: context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed.

@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

Opened follow-up architecture issue #86023 for the broader long-running Codex session design.

This PR remains the narrow correctness fix: preserve valid context-engine thread_bootstrap native reuse through the startup guard, while keeping stale/no-active-engine bindings safe. The follow-up issue tracks the larger work around semantic thread/cache ownership, model/config-aware native guards, binding continuity across LCM/session-file rotation, workspace bootstrap fingerprints, and diagnostics.

@100yenadmin

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Follow-up pushed in 4e7ed27 after CI surfaced check-test-types. The code path is unchanged; this only fixes test typing for the new stale-binding regression and an app-server test helper tuple signature.

Additional local validation from /Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse:

pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts extensions/codex/src/app-server/run-attempt.test.ts
pnpm tsgo:extensions:test
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
git diff --check

Results: extension test typecheck passed; context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed.

@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@100yenadmin 100yenadmin force-pushed the codex/native-thread-reuse-budget branch from 4e7ed27 to 98d1082 Compare May 24, 2026 11:25
@100yenadmin

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Rebased onto current upstream/main after GitHub reported conflicts. The conflict was only the test-helper type signature that upstream had already fixed with _requestParams; kept upstream’s version. Current head is 98d1082.

Re-ran from /Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse after the rebase:

pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts extensions/codex/src/app-server/run-attempt.test.ts
pnpm tsgo:extensions:test
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
git diff --check

Results: extension test typecheck passed; context-engine 20/20 passed; native guard slice 3 passed / 215 skipped; format and whitespace checks passed.

@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 24, 2026
Add Vincent Koc as a co-author for the PR context and review trail.

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

@vincentkoc tagging you here for context and co-author credit.

This PR is part of the native-thread stabilization work: when threads are not set up correctly, cache continuity and compaction can attach to the wrong thread or break across resumes. This change fixes that path so cached context and compaction state stay attached to the intended native thread.

I also added a no-code co-author commit on this branch with Co-authored-by: Vincent Koc <vincentkoc@ieee.org>.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@vincentkoc vincentkoc self-assigned this May 26, 2026
@steipete

Copy link
Copy Markdown
Contributor

Thanks Eva. This fix is now covered on main by 7a14741, which preserves context-engine thread-bootstrap reuse while keeping the native byte guard and moving the token guard to Codex's reported model context window with a high recovery fallback.

Proof recorded on the landed commit:

  • fnm exec --using v24.15.0 -- node scripts/run-vitest.mjs run extensions/codex/src/app-server/run-attempt.test.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts --reporter=dot --pool=forks --no-file-parallelism
  • git diff --check
  • autoreview local pass
  • Testbox check:changed tbx_01ksjm1hy7mfrc5bebzyckqdew, GitHub Actions run https://github.com/openclaw/openclaw/actions/runs/26463150977, exit 0

The PR branch now conflicts with current main because the same area has already landed there, so I am closing this as landed via that commit.

@steipete steipete closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex app-server rotates context-engine bootstrap threads after large first turns

3 participants