Skip to content

fix(codex): recover context overflow and budget skip paths#87879

Closed
fuller-stack-dev wants to merge 5 commits into
openclaw:mainfrom
fuller-stack-dev:fix/codex-terminal-overflow-binding
Closed

fix(codex): recover context overflow and budget skip paths#87879
fuller-stack-dev wants to merge 5 commits into
openclaw:mainfrom
fuller-stack-dev:fix/codex-terminal-overflow-binding

Conversation

@fuller-stack-dev

@fuller-stack-dev fuller-stack-dev commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR fixes two tightly related Codex native-session recovery edges found while replaying a Discord-shaped session after the 5.27 upgrade:

  1. If a resumed Codex app-server turn accepts turn/start but later terminally fails with a context-window overflow while the context engine owns compaction, clear the stale app-server thread binding so the outer retry starts a fresh native Codex thread.
  2. If the CLI budget-compaction lifecycle receives Codex app-server's intentional automatic-compaction skip (codex app-server owns automatic compaction), treat that as a non-fatal skip instead of failing a successful turn after the model already responded.

Real behavior proof

  • Behavior addressed: Codex native app-server sessions recover from a terminal context-window overflow on a resumed context-engine thread by clearing the stale thread binding, and CLI budget compaction treats Codex app-server's intentional automatic-compaction skip as non-fatal after a successful turn.
  • Real environment tested: Local OpenClaw source checkout installed into the gateway-resolved global OpenClaw prefix, Node 24, Codex app-server runtime, local gateway health endpoint, and a Discord-shaped agent route. Channel/session identifiers were anonymized.
  • Exact steps or command run after this patch:
pnpm build
npm install -g .
<gateway-openclaw> --version
curl -sS -m 5 http://127.0.0.1:18789/health
openclaw agent --session-key <discord-session> --channel discord --message "[exact-route-smoke] Reply with exactly: OK" --timeout 900 --json
  • Evidence after fix: The local runtime reported OpenClaw 2026.5.28 (3fd6cc0), the gateway health endpoint returned {"ok":true,"status":"live"}, the exact-route smoke returned status=ok, payload.text="OK", and exitCode=0, and the anonymized logs contained skipping codex app-server compaction for non-manual trigger without the prior fatal compaction failure.
  • Observed result after fix: The replay started a fresh native Codex thread after the stale resumed thread overflowed, subsequent turns resumed that new thread, the old Auto-compaction could not recover this turn message did not recur, and the successful assistant response was preserved instead of being converted into a CLI failure.
  • What was not tested: Live Discord delivery with real channel IDs was not repeated in the PR body evidence; local identifiers, channel IDs, session IDs, auth profile IDs, hostnames, and absolute paths were redacted.

Before Evidence

All identifiers below are anonymized. No personal names, hostnames, channel IDs, session IDs, auth profile IDs, or absolute local paths are included.

Overflow recovery before patch

A production-like replay showed the context-engine projection repeatedly resuming the same stale native thread:

session=<discord-session>
previousThreadId=<old-native-thread>
projectedPromptChars≈222k
native action=resumed thread=<old-native-thread>

The app-server then reported auto-compaction success, but the next retry still resumed <old-native-thread> instead of starting fresh. The turn eventually exhausted recovery and surfaced the user-facing error:

Auto-compaction could not recover this turn. I kept this conversation mapped to the current session.

There was no log evidence that the binding was cleared after the terminal context-window failure, because the overflow happened after turn/start had already been accepted.

Budget compaction before patch

After the first local fix was installed, a Discord-shaped CLI smoke successfully wrote the assistant response:

payload.text = "OK"
provider = openai-codex
action = completed

But the CLI still exited non-zero during the post-turn budget lifecycle:

skipping codex app-server compaction for non-manual trigger trigger=budget
CLI native harness compaction did not reduce context
failureReason="codex app-server owns automatic compaction"

So the user-visible response succeeded, then the CLI incorrectly converted the intentional Codex app-server budget skip into a failed command.

After Evidence

Overflow recovery after patch

With the stale binding restored locally for replay, the same Discord-shaped session no longer reused <old-native-thread>. The retry cleared the binding and started a new native thread:

native action=started thread=<new-native-thread>
follow-up native action=resumed thread=<new-native-thread>

No context-overflow-diag, exhausted recovery, or auto-compaction recovery failure appeared during the replay. The assistant transcript completed with:

payload.text = "OK"

Budget compaction after patch

After adding the CLI-side skip handling and installing this branch into the local runtime, the same smoke command completed successfully:

status = ok
summary = completed
payload.text = "OK"
provider = openai-codex
model = gpt-5.5
exitCode = 0

This proves the CLI now preserves the successful turn result when Codex app-server intentionally owns automatic compaction for a budget trigger.

Follow-up Exact-Route Smoke

A follow-up Discord-shaped replay intentionally used the long session that previously triggered the user-facing failure. The first replay reproduced a failure because the shell CLI was patched, but the LaunchAgent gateway was still resolving a separate global OpenClaw install at 2026.5.27.

After installing this same PR commit into the gateway-resolved global prefix and restarting the gateway:

<gateway-openclaw> --version
# OpenClaw 2026.5.28 (3fd6cc0)

curl -sS -m 5 http://127.0.0.1:18789/health
# {"ok":true,"status":"live"}

openclaw agent --session-key <discord-session> --channel discord --message "[exact-route-smoke] Reply with exactly: OK" --timeout 900 --json
# exitCode=0, status=ok

Correlated anonymized log evidence from the successful replay:

[agent/embedded] skipping codex app-server compaction for non-manual trigger
OK

The old fatal lines did not recur in that successful replay:

CLI native harness compaction failed ... codex app-server owns automatic compaction
Auto-compaction could not recover this turn

Tests

Focused tests:

node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/command/cli-compaction.test.ts -t "treats Codex automatic compaction ownership as a non-fatal CLI budget skip"
node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts -t "clears a resumed context-engine binding when a turn terminally overflows"
pnpm tsgo:prod
node scripts/run-oxlint-shards.mjs --threads=8

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 29, 2026
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 10:32 PM ET / 02:32 UTC.

Summary
The branch clears resumed Codex app-server context-engine bindings after terminal context overflow and makes CLI budget compaction fall back to context-engine compaction when Codex reports it owns automatic compaction.

PR surface: Source +40, Tests +164. Total +204 across 4 files.

Reproducibility: yes. Current main source shows Codex non-manual compaction returns the owned-skip reason while CLI compaction throws on non-fallback native outcomes; the PR body also includes before/after redacted gateway logs for the same failure mode.

Review metrics: 1 noteworthy metric.

  • Recovery Semantics Touched: 2 session-state paths changed. One path clears persisted Codex thread bindings after terminal overflow and one changes CLI behavior after Codex-owned compaction skips, so maintainer review should focus on session continuity.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Maintainers should choose whether this PR is the canonical budget-compaction recovery behavior or supersede it with one of the overlapping open PRs.
  • If the binding-clear boundary remains uncertain, ask for one follow-up-turn proof showing the fresh native thread is reused after recovery.

Risk before merge

  • [P1] Merging changes persisted Codex app-server thread binding behavior: a resumed context-engine thread that terminally overflows will drop native thread continuity so a later retry can start fresh.
  • [P1] Open overlapping compaction PRs propose different budget-compaction ownership behavior, so maintainers should choose one canonical contract before landing to avoid conflicting session-state semantics.

Maintainer options:

  1. Land This As The Canonical Recovery Contract (recommended)
    If maintainers accept context-engine fallback after Codex-owned automatic-compaction skips and stale-binding clearing after terminal overflow, land this PR and close or supersede the overlapping budget-compaction PRs.
  2. Choose A Different Budget-Compaction Contract
    If maintainers prefer the native-owned skip to proceed without context-engine fallback, pause this PR and use the narrower competing direction for the budget-skip half.
  3. Ask For Extra Session-Continuity Proof
    If the binding-clear boundary is still uncertain, require a focused follow-up-turn proof showing the next turn starts fresh and then resumes the new native thread without losing the visible assistant result.

Next step before merge

  • [P2] The remaining action is maintainer selection of the session-state and compaction ownership contract, not a narrow automated repair.

Security
Cleared: The diff changes TypeScript runtime/test handling for Codex compaction and binding recovery, with no dependency, CI, secret, permission, or supply-chain surface changes found.

Review details

Best possible solution:

Adopt one canonical Codex app-server recovery contract that preserves successful turns, clears stale resumed context-engine bindings after terminal overflow, and retires the overlapping compaction PRs after the chosen path lands.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source shows Codex non-manual compaction returns the owned-skip reason while CLI compaction throws on non-fallback native outcomes; the PR body also includes before/after redacted gateway logs for the same failure mode.

Is this the best way to solve the issue?

Unclear but promising. The code change is narrow and tested, but maintainers still need to choose whether context-engine fallback after a Codex-owned budget skip is the canonical behavior versus the overlapping open PRs.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 440e737c67dc.

Label changes

Label justifications:

  • P1: The PR addresses Codex context-overflow and CLI compaction failures that can break active agent/channel turns for real users.
  • merge-risk: 🚨 session-state: The diff intentionally clears persisted Codex thread bindings and changes compaction fallback behavior for existing Codex sessions.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body includes after-fix terminal/log proof from a local installed gateway runtime and exact-route agent smoke, with private identifiers redacted.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal/log proof from a local installed gateway runtime and exact-route agent smoke, with private identifiers redacted.
Evidence reviewed

PR surface:

Source +40, Tests +164. Total +204 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 2 42 2 +40
Tests 2 164 0 +164
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 206 2 +204

What I checked:

  • Repository policy read: Read the full root AGENTS.md and applied its ClawSweeper review guidance for session-state and fallback behavior as compatibility-sensitive merge risk. (AGENTS.md:17, 440e737c67dc)
  • Scoped policies read: Read the scoped guides for the touched plugin and agent paths; extensions/AGENTS.md keeps bundled plugin boundaries explicit and src/agents/AGENTS.md covers agent test/runtime guardrails. (extensions/AGENTS.md:20, 440e737c67dc)
  • Current main Codex skip contract: Current main returns ok=true, compacted=false, reason='codex app-server owns automatic compaction' for non-manual Codex app-server compaction triggers. (extensions/codex/src/app-server/compact.ts:131, 440e737c67dc)
  • Current main CLI failure path: Current main throws when native harness CLI compaction returns a non-compacted result without a recognized fallback, which covers the Codex-owned automatic compaction skip. (src/agents/command/cli-compaction.ts:523, 440e737c67dc)
  • PR CLI recovery path: The PR recognizes the Codex-owned automatic compaction skip and returns fallbackToContextEngine=true instead of letting the CLI turn fail. (src/agents/command/cli-compaction.ts:415, 51ce7910fe1d)
  • PR terminal-overflow recovery path: The PR clears the persisted Codex app-server binding when a resumed context-engine thread terminally fails with a context-window error after turn/start succeeds. (extensions/codex/src/app-server/run-attempt.ts:1967, 51ce7910fe1d)

Likely related people:

  • steipete: Git log/blame show Peter Steinberger introduced or carried the current Codex app-server and CLI compaction surfaces that this PR changes, including current-main d92a029 and earlier Codex app-server harness work. (role: feature owner and recent area contributor; confidence: high; commits: d92a0292a966, 27ae826f6525, dd26e8c44d4e; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/compact.ts, src/agents/command/cli-compaction.ts)
  • vincentkoc: Recent history shows Vincent Koc touched Codex app-server startup/session binding behavior around auth profile reuse, adjacent to the binding lifecycle changed here. (role: adjacent Codex session/auth contributor; confidence: medium; commits: f1cc8f0cfc7c, 859eb0666282; files: extensions/codex/src/app-server/run-attempt.ts)
  • joshavant: Recent main history shows Josh Avant touched run-attempt.ts for Codex app-server projection/media handling, adjacent to the terminal-turn result path changed here. (role: recent adjacent contributor; confidence: low; commits: f870beac85ec; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label May 29, 2026
@fuller-stack-dev fuller-stack-dev changed the title fix(codex): clear resumed context thread after terminal overflow fix(codex): recover context overflow and budget skip paths May 29, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@fuller-stack-dev fuller-stack-dev force-pushed the fix/codex-terminal-overflow-binding branch from fff0954 to 07467ce Compare May 29, 2026 05:25
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. gateway Gateway runtime and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 29, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026

@compoodment compoodment left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: upstream guard blocks CLI compaction for Codex sessions entirely

Both commits are correct in isolation, but there is a structural gap upstream of both fixes that leaves Codex sessions without any compaction when the Codex app-server thread binding is absent.

The guard blocks everything

shouldSkipAutomaticCompactionForCodexRuntime() (cli-compaction.ts ~L268) runs before isNativeHarnessCompactionSession and compactNativeHarnessCliTranscript. For any openai-codex provider session where resolveAgentHarnessPolicy returns runtime: "codex", this function returns true and the entire runCliTurnCompactionLifecycle short-circuits — it returns sessionEntry unchanged with a debug log.

This means:

  1. Commit 2's fix (isIntentionalNativeAutoCompactionSkipskipped: true → early return) is unreachable for Codex sessions because the outer guard already skipped them.

  2. When Commit 1 clears the thread binding after terminal overflow, the Codex app-server no longer has a thread to compact. But shouldSkipAutomaticCompactionForCodexRuntime still returns true, so OpenClaw CLI compaction is also blocked. Nobody handles compaction.

Reproduction

On Dicky's OpenClaw instance (2026.5.27-beta.1, both commits patched in), a Discord session on openai-codex/gpt-5.5 grew to 184 messages / 277k prompt tokens / 22k overflow tokens with compactionCount: 0. The session never compacted because shouldSkipAutomaticCompactionForCodexRuntime returned true every turn.

Suggested fix

When a Codex session has no active thread binding (i.e., readCodexAppServerBinding returns undefined), shouldSkipAutomaticCompactionForCodexRuntime should return false so the CLI compaction lifecycle falls through to context-engine compaction. Alternatively, compactNativeHarnessCliTranscript could detect the missing binding and set fallbackToContextEngine: true so the caller falls through to context-engine compaction instead of returning early on the skipped path.

The current skipped early-return in runCliTurnCompactionLifecycle should also fall through to context-engine compaction rather than returning sessionEntry unchanged — otherwise the session will grow unbounded with no compaction at all.

@compoodment

Copy link
Copy Markdown
Contributor

Correction: my earlier review referenced shouldSkipAutomaticCompactionForCodexRuntime as a problem with this PR. That function does not exist on main or in this PR branch — it was introduced in the 2026.5.27-beta.1 beta release. On main, runCliTurnCompactionLifecycle goes directly to isNativeHarnessCompactionSession, which correctly handles the missing-binding case by falling through to context-engine compaction.

The actual gap I encountered was specific to the beta release, not this PR. The PR itself correctly returns fallbackToContextEngine: true from compactNativeHarnessCliTranscript when the Codex app-server skips auto-compaction, which allows the caller to fall through to context-engine compaction. My initial dist JS patch incorrectly used skipped: true with an early return instead — that was a patching error on my end, not a PR issue.

The PR changes are correct as-is. Apologies for the noise.

…overflow-binding

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.ts
@openclaw-barnacle openclaw-barnacle Bot removed the gateway Gateway runtime label May 30, 2026
@steipete steipete self-assigned this May 30, 2026
@steipete

Copy link
Copy Markdown
Contributor

Thanks Jason. I landed #88207 as the superset fix in 81505ad.

#88207 includes this PR's Codex budget-skip recovery and stale resumed context-engine binding cleanup, then adds the native thread overflow rotation/headroom checks and the maintainer follow-ups from review. Closing this one so we do not keep two divergent fixes open for the same session-state path.

@steipete steipete closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants