Skip to content

Fix Codex native thread overflow rotation#88207

Merged
steipete merged 7 commits into
openclaw:mainfrom
fuller-stack-dev:codex/codex-terminal-overflow-binding
May 30, 2026
Merged

Fix Codex native thread overflow rotation#88207
steipete merged 7 commits into
openclaw:mainfrom
fuller-stack-dev:codex/codex-terminal-overflow-binding

Conversation

@fuller-stack-dev

@fuller-stack-dev fuller-stack-dev commented May 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • keep the existing Codex terminal-overflow repair stack together: clear/resume unsafe context-engine bindings, tolerate Codex-owned automatic compaction skips, and rotate stale native Codex threads before they can overflow
  • run the native rollout token-pressure scan under default config and context-engine thread-bootstrap bindings; keep byte-size rollout truncation opt-in behind truncateAfterCompaction
  • include the projected next turn prompt/developer instructions in the native headroom check and rebuild projected context when that late guard starts a fresh thread

This PR is intentionally scoped to the Codex terminal-overflow path:

extensions/codex/src/app-server/run-attempt.context-engine.test.ts
extensions/codex/src/app-server/run-attempt.test.ts
extensions/codex/src/app-server/run-attempt.ts
extensions/codex/src/app-server/startup-binding.test.ts
extensions/codex/src/app-server/startup-binding.ts
src/agents/command/cli-compaction.test.ts
src/agents/command/cli-compaction.ts
src/gateway/session-utils.fs.ts

Behavioral Proof

Before, a budget-triggered Codex app-server session entered the bad path because OpenClaw skipped non-manual Codex app-server compaction and then failed the native compaction fallback:

openclaw-2026-05-29.log:<line> skipping codex app-server compaction for non-manual trigger sessionId=<redacted-session-id> sessionKey=<redacted-channel-session-key> trigger=budget
openclaw-2026-05-29.log:<line> agent errorCode=UNAVAILABLE errorMessage=Error: CLI native harness compaction failed for openai-codex/gpt-5.5: codex app-server owns automatic compaction runId=<redacted-run-id>

The matching runtime state had OpenClaw session totals below the budget but Codex native rollout pressure already near the native window:

sessionTotalTokens=87792
sessionContextTokens=272000
nativeTotalTokens=241198
nativeModelContextWindow=258400
config=default / truncateAfterCompaction unset

After installing this branch locally as the packaged OpenClaw build from this PR and restarting the managed Gateway onto the same build, I recreated that state with redacted synthetic identifiers, trigger=budget, model openai-codex/gpt-5.5, default compaction config, stale Codex binding, and the native rollout token snapshot above.

[agent/embedded] codex app-server native transcript exceeded active token limit; starting a fresh thread
methods: thread/start, turn/start, thread/unsubscribe
resumedOversizedThread: false
startedFreshThreadBeforeTurn: true
savedThreadId: thread-fresh-before-overflow

Verification

  • CI=true pnpm build
  • pnpm pack --pack-destination <temp-dir>
  • pnpm add -g <packed-openclaw-tarball>
  • openclaw --version -> packaged OpenClaw build from this PR
  • openclaw gateway install --force --port <local-port> --wrapper <local-openclaw-wrapper>
  • openclaw gateway status --deep -> CLI and Gateway both on the packaged PR build
  • installed-package synthetic repro described above
  • node --no-maglev <vitest> run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/startup-binding.test.ts --reporter=verbose -> 12 passed
  • node --no-maglev <vitest> run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "starts a fresh Codex thread before turn/start when the next prompt would exhaust native headroom" --reporter=verbose -> 1 passed
  • node --no-maglev <vitest> run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts -t "starts a fresh thread instead of resuming a token-pressured thread-bootstrap binding|resumes a matching thread-bootstrap binding even when the bootstrap turn exceeded the opt-in native byte guard" --reporter=verbose -> 2 passed
  • git diff --check

Real behavior proof

  • Behavior addressed: Budget-triggered Codex app-server sessions no longer resume a native Codex thread whose persisted rollout token usage is already too close to the native context window; OpenClaw starts a fresh Codex thread before turn/start instead of entering the previous overflow/failed-compaction state.

  • Real environment tested: Local packaged OpenClaw install from this PR, with the managed Gateway reinstalled/restarted onto the same package on a local loopback port.

  • Exact steps or command run after this patch: Built and packed the branch, installed the tarball globally, reinstalled the Gateway LaunchAgent using a local openclaw wrapper, then executed an installed-package run-attempt repro with redacted session identifiers, budget trigger, default compaction config, stale Codex binding, sessionTotalTokens=87792, contextTokens=272000, native rollout total_tokens=241198, and model_context_window=258400.

  • Evidence after fix: Redacted terminal transcript from the installed packaged build:

    [agent/embedded] codex app-server native transcript exceeded active token limit; starting a fresh thread
    methods: thread/start, turn/start, thread/unsubscribe
    resumedOversizedThread: false
    startedFreshThreadBeforeTurn: true
    savedThreadId: thread-fresh-before-overflow
    
  • Observed result after fix: Copied live output showed thread/start before turn/start, no thread/resume for the oversized stale thread, and a rewritten binding value of thread-fresh-before-overflow.

  • What was not tested: A live OpenAI Codex model call was not made; the proof uses the installed OpenClaw package and a local app-server harness seeded with the persisted session/rollout token state from the observed failure.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling extensions: codex size: L triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 30, 2026
@clawsweeper

clawsweeper Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 10:49 PM ET / 02:49 UTC.

Summary
The branch updates Codex app-server startup/run-attempt compaction handling so token-pressured native threads rotate before turn/start, Codex-owned automatic compaction skips fall back to context-engine compaction, and regression tests cover the default token-pressure path.

PR surface: Source +169, Tests +379. Total +548 across 7 files.

Reproducibility: yes. Current main source skips the native token-pressure scan under default config, and the PR body provides observed logs plus a seeded installed-package repro for the overflow state.

Review metrics: 1 noteworthy metric.

  • Default Codex rotation behavior: 1 default path changed, 0 new config keys. The token-pressure guard now runs under default compaction config while byte-size truncation remains opt-in, so maintainers should notice the upgrade behavior before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Confirm maintainer acceptance of the default token-pressure fresh-thread behavior before landing.

Risk before merge

  • [P1] Merging intentionally changes default Codex resume behavior: existing users with persisted oversized native rollout state may get a fresh native thread even when truncateAfterCompaction is unset.
  • [P1] The supplied proof uses an installed packaged OpenClaw build and local app-server harness seeded with observed state, but it does not include a live OpenAI Codex model call.

Maintainer options:

  1. Land With Default Rotation Acknowledged (recommended)
    Accept that upgrades may clear oversized persisted native Codex thread bindings under default config to avoid terminal overflow.
  2. Gate The Default Change
    If maintainers do not want default fresh-thread rotation, keep current default resume behavior and make token-pressure rotation opt-in or migration-backed before merge.

Next step before merge

  • No automated repair is identified; maintainers should make the merge call on the intentional default session-rotation behavior and normal CI gates.

Security
Cleared: The diff changes Codex runtime/session handling and tests only; I found no concrete security or supply-chain issue such as dependency, workflow, script, permission, or secret-handling expansion.

Review details

Best possible solution:

Land the scoped Codex fix after maintainers accept the default fresh-thread rotation tradeoff and focused Codex checks are green.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source skips the native token-pressure scan under default config, and the PR body provides observed logs plus a seeded installed-package repro for the overflow state.

Is this the best way to solve the issue?

Yes. The patch keeps the fix in the Codex app-server startup/run-attempt path, preserves byte truncation as opt-in, and adds focused regression tests; the main remaining question is accepting the intentional default fresh-thread behavior.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e9dee8dfe158.

Label changes

Label changes:

  • add P1: The PR addresses a broken Codex agent compaction/overflow workflow that can surface as an unavailable agent run for real users.
  • add merge-risk: 🚨 compatibility: The diff changes default Codex startup behavior for existing persisted native thread bindings without adding a new operator setting.
  • add merge-risk: 🚨 session-state: The fix intentionally clears and rewrites Codex native thread bindings when rollout token pressure is too high.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live output from an installed packaged OpenClaw build with the managed Gateway restarted onto that package, seeded with the observed token-pressure state.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix live output from an installed packaged OpenClaw build with the managed Gateway restarted onto that package, seeded with the observed token-pressure state.

Label justifications:

  • P1: The PR addresses a broken Codex agent compaction/overflow workflow that can surface as an unavailable agent run for real users.
  • merge-risk: 🚨 compatibility: The diff changes default Codex startup behavior for existing persisted native thread bindings without adding a new operator setting.
  • merge-risk: 🚨 session-state: The fix intentionally clears and rewrites Codex native thread bindings when rollout token pressure is too high.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix live output from an installed packaged OpenClaw build with the managed Gateway restarted onto that package, seeded with the observed token-pressure state.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live output from an installed packaged OpenClaw build with the managed Gateway restarted onto that package, seeded with the observed token-pressure state.
Evidence reviewed

PR surface:

Source +169, Tests +379. Total +548 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 3 218 49 +169
Tests 4 470 91 +379
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 688 140 +548

What I checked:

  • Root and scoped policy read: Read full root AGENTS.md plus relevant scoped guides for extensions/, src/agents/, and src/gateway/; their proof, compatibility, and session-state guidance shaped this review. (AGENTS.md:1, e9dee8dfe158)
  • Current-main bug surface: Current main returns the existing Codex startup binding unless truncateAfterCompaction is true, so the native rollout token scan cannot protect default-config sessions from the reported token-pressure overflow. (extensions/codex/src/app-server/startup-binding.ts:244, e9dee8dfe158)
  • PR token-pressure guard: The PR head moves token-pressure evaluation ahead of the opt-in byte guard, uses configured/default reserve tokens plus projected turn tokens, and clears the binding when the native thread is too close to the effective context window. (extensions/codex/src/app-server/startup-binding.ts:280, 2a87c212f1fc)
  • PR pre-turn rotation: The PR estimates the rendered Codex turn prompt and developer instructions, reruns startup-binding rotation before turn/start, and rebuilds projection state when the guard starts a fresh thread. (extensions/codex/src/app-server/run-attempt.ts:813, 2a87c212f1fc)
  • Regression coverage added: The diff adds tests for default token-pressure checks, projected-turn token headroom, session context-window fallback, fresh thread startup before turn/start, context-engine thread-bootstrap rotation, and Codex-owned auto-compaction fallback. (extensions/codex/src/app-server/startup-binding.test.ts:40, 2a87c212f1fc)
  • Real behavior proof in PR body: The PR body supplies redacted installed-package proof from a local packaged OpenClaw build and managed Gateway, showing thread/start before turn/start, no oversized thread/resume, and a rewritten binding ID; it explicitly notes no live OpenAI Codex model call was made. (2a87c212f1fc)

Likely related people:

  • steipete: The central current-main startup-binding, CLI compaction, and context-engine overflow handling lines blame to 69550a9d3dda96eae46e74985268649272bd8395, authored by Peter Steinberger. (role: introduced behavior / recent area contributor; confidence: medium; commits: 69550a9d3dda; files: extensions/codex/src/app-server/startup-binding.ts, extensions/codex/src/app-server/run-attempt.ts, src/agents/command/cli-compaction.ts)
  • joshavant: Recent merged Codex app-server run-attempt work touched the same large runtime file shortly before this PR, so this is a plausible review/routing candidate for adjacent behavior context. (role: recent adjacent contributor; confidence: medium; commits: f870beac85ec; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@fuller-stack-dev fuller-stack-dev force-pushed the codex/codex-terminal-overflow-binding branch from ac61677 to 444d102 Compare May 30, 2026 02:40
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed gateway Gateway runtime triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 30, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 30, 2026
@steipete steipete self-assigned this May 30, 2026
@steipete steipete force-pushed the codex/codex-terminal-overflow-binding branch from 2a87c21 to 466bfbe Compare May 30, 2026 06:02
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 30, 2026
@steipete steipete merged commit 81505ad into openclaw:main May 30, 2026
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants