Skip to content

fix(codex): surface native compaction failures#85160

Merged
joshavant merged 3 commits into
mainfrom
fix/codex-compaction-routing-84305
May 22, 2026
Merged

fix(codex): surface native compaction failures#85160
joshavant merged 3 commits into
mainfrom
fix/codex-compaction-routing-84305

Conversation

@joshavant

Copy link
Copy Markdown
Contributor

Summary

  • Route Codex-harness compaction through the Codex app-server native compaction path, including OpenAI default Codex-runtime sessions, while falling back safely when stale harness pins no longer match.
  • Serialize local post-turn maintenance ahead of visible idle/next-turn handling, surfacing native compaction failures instead of silently continuing over budget.
  • Persist fresh post-compaction token/session metadata and harden TUI lifecycle handling so delayed finishing/end events do not clobber newer runs.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/compact.test.ts src/agents/harness/selection.test.ts src/agents/command/cli-compaction.test.ts src/tui/tui-event-handlers.test.ts src/agents/agent-command.live-model-switch.test.ts — 8 files, 193 tests passed.
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/compact.test.ts src/agents/agent-command.live-model-switch.test.ts — 3 files, 65 tests passed after rebase conflict resolution.
  • node scripts/run-vitest.mjs src/tui/tui-event-handlers.test.ts — 1 file, 50 tests passed for the post-final lifecycle regression.
  • git diff --check passed.
  • $autoreview: AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode auto --reviewer codex --fallback-reviewer none — clean, no accepted/actionable findings.

Real behavior proof

Behavior addressed: Codex-native post-turn compaction failures are no longer silent; successful native compaction waits for completion and records fresh token/session state before the run becomes idle or allows the next local turn.

Real environment tested: Live OpenClaw TUI/local Codex runtime using openai-codex/gpt-5.5, plus release-grade external smoke lanes for Discord, Slack, Telegram, and WhatsApp where credentials/infrastructure allowed.

Exact steps or command run after this patch: Final focused fix-proof lane with small contextTokens budget drove successful Codex-native compaction across two TUI turns, then a forced native compaction timeout lane verified the user-visible failure path. External matrix artifacts: .artifacts/qa-e2e/issue-84305-release-smoke-2026-05-21T23-47-59-559Z/matrix-summary.json and .artifacts/qa-e2e/issue-84305-telegram-whatsapp-retry-2026-05-22T00-25-46-440Z/matrix-summary.json.

Evidence after fix: Success lane logged started codex app-server compaction and completed codex app-server compaction on both turns; session state showed contextTokens: 4000, modelProvider: openai-codex, model: gpt-5.5, agentHarnessId: codex, totalTokensFresh: true, and compaction count incrementing to 2. Failure lane exited 1 with CLI native harness compaction failed for openai-codex/gpt-5.5: timed out waiting for codex app-server compaction..., with no auth/provider failure.

Observed result after fix: Successful Codex-native compaction completed and persisted before idle; forced native compaction failure surfaced clearly instead of letting an over-budget session continue uncompacted. External smoke matrix product results: Codex Discord normal passed 2/2, Codex Discord forced-compaction passed 3/3, Codex Slack forced-compaction passed 7/7, Telegram normal retry passed 2/2; Telegram canary and WhatsApp were blocked by environment/credential/infrastructure limits rather than demonstrated branch regressions.

What was not tested: GitHub Real behavior proof CI check is expected to be ignored for this PR per maintainer instruction. WhatsApp could not be product-validated because the available credential was logged out/401, and some Telegram canary retries timed out in infrastructure.

Fixes #84305

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: XL maintainer Maintainer-authored PR labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch routes Codex-harness compaction through native Codex app-server compaction, serializes local post-turn maintenance before TUI idle/final handling, updates session/token accounting, and adds regression coverage plus a changelog entry.

Reproducibility: yes. for the source-level path: current main emits terminal lifecycle before post-turn maintenance and swallows CLI compaction errors under transcript persistence, while #84305 provides production traces with over-window Codex turns and compactionCount=0. I did not rerun the live Codex reproduction in this read-only review.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: Strong live proof and focused regression coverage support the fix, while the XL runtime/session-state surface keeps the overall rating at normal mergeable quality pending maintainer risk acceptance.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (logs): The PR body and follow-up comment include after-fix live OpenClaw TUI/OpenAI Codex runtime logs for successful native compaction, a forced native compaction timeout failure, and upgrade-path smokes; copied live output is sufficient for this non-visual runtime behavior.

Risk before merge

  • Merging intentionally changes nonrecoverable Codex native compaction from best-effort continuation to a visible fail-closed turn error, so existing over-budget Codex sessions may stop until the user resets or the stale-binding fallback succeeds.
  • The patch mutates session IDs/files, token freshness, compaction counts, pending final delivery state, and local TUI lifecycle ordering; stale Codex bindings and in-flight local runs are the main upgrade-safety surface.
  • The PR body contains detailed live-log proof and a follow-up upgrade smoke note, but some referenced matrix artifacts are local .artifacts paths that were not independently inspectable from this read-only checkout.

Maintainer options:

  1. Accept fail-closed Codex compaction (recommended)
    Merge after maintainer review if OpenClaw should stop over-budget Codex sessions when native compaction cannot complete, relying on stale-binding fallback for recoverable upgrade cases.
  2. Ask for raw upgrade artifacts
    Before merge, require attached raw logs or artifacts for old sessions with missing and stale Codex bindings so reviewers can inspect the upgrade path directly.
  3. Pause for compatibility-first behavior
    Pause this PR if maintainers want current best-effort continuation preserved by default with a separate explicit strict compaction failure mode.

Next step before merge
Protected maintainer label and fail-closed session lifecycle compatibility risk require human maintainer acceptance; I did not identify a narrow automated repair to queue.

Security
Cleared: No concrete security or supply-chain regression was found in the reviewed diff surface.

Review details

Best possible solution:

Land only after maintainers accept the fail-closed Codex compaction semantics and are satisfied that the stale-binding fallback plus session-accounting coverage are enough for upgrade safety.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level path: current main emits terminal lifecycle before post-turn maintenance and swallows CLI compaction errors under transcript persistence, while #84305 provides production traces with over-window Codex turns and compactionCount=0. I did not rerun the live Codex reproduction in this read-only review.

Is this the best way to solve the issue?

Yes, conditionally: routing Codex sessions through native app-server compaction and delaying terminal lifecycle until durable maintenance completes is the right ownership boundary. The remaining question is maintainer acceptance of the fail-closed compatibility behavior.

Label justifications:

  • P1: The PR targets a real Codex runtime/session-state failure that can leave users over context and unable to continue normal agent turns.
  • merge-risk: 🚨 compatibility: Existing sessions that previously continued after compaction failure can now fail closed with a visible turn error.
  • merge-risk: 🚨 session-state: The diff changes persisted token freshness, compaction counts, session IDs/files, pending final delivery state, and stale binding behavior.
  • merge-risk: 🚨 availability: A hung or failed native compaction path can now stop a local turn rather than letting the workflow continue uncompacted.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong live proof and focused regression coverage support the fix, while the XL runtime/session-state surface keeps the overall rating at normal mergeable quality pending maintainer risk acceptance.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body and follow-up comment include after-fix live OpenClaw TUI/OpenAI Codex runtime logs for successful native compaction, a forced native compaction timeout failure, and upgrade-path smokes; copied live output is sufficient for this non-visual runtime behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and follow-up comment include after-fix live OpenClaw TUI/OpenAI Codex runtime logs for successful native compaction, a forced native compaction timeout failure, and upgrade-path smokes; copied live output is sufficient for this non-visual runtime behavior.

What I checked:

Likely related people:

  • Super Zheng: Blame and file history show commit 01d95b9 introduced the current maybeCompactCodexAppServerSession, runCliTurnCompactionLifecycle, and embedded TUI backend surfaces this PR changes. (role: introduced current compaction surface; confidence: high; commits: 01d95b9757a0; files: extensions/codex/src/app-server/compact.ts, src/agents/command/cli-compaction.ts, src/tui/embedded-backend.ts)
  • joshavant: Current-main history shows Josh Avant recently touched the Codex app-server compaction file in commit ba06376, and the PR continues that Codex runtime area. (role: recent adjacent Codex contributor; confidence: medium; commits: ba06376c7955, 205523880e75, 8389f62722d5; files: extensions/codex/src/app-server/compact.ts, src/agents/command/cli-compaction.ts)
  • Peter Steinberger: Blame on the current lifecycle/error-handling region in agent-command.ts includes commit cabb553, adjacent to the lifecycle and session-store behavior this PR changes. (role: recent agent-command/session workflow contributor; confidence: medium; commits: cabb55380f84; files: src/agents/agent-command.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against c8a35c4645dc.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Neon Branchling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: hums during re-review.
Image traits: location diff observatory; accessory little merge flag; palette sunrise gold and clean white; mood mischievous; pose curling around a status light; shell translucent glimmer shell; lighting warm desk-lamp glow; background smooth stones and checkmarks.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Neon Branchling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@joshavant

Copy link
Copy Markdown
Contributor Author

Upgrade-path live smoke note from the follow-up testing:

I ran these through the product entrypoint (pnpm openclaw agent --local) with isolated OPENCLAW_STATE_DIR / OPENCLAW_CONFIG_PATH and live OpenAI/Codex inference. The temp state was removed afterward.

What the live lanes proved:

  • A Codex-native live probe selected openai-codex/gpt-5.5, reported agentHarnessId: codex, replied with the expected marker, and logged native app-server compaction start/completion.
  • An old Codex session with no existing .codex-app-server.json binding completed a real turn, replied with the expected marker, and completed post-turn compaction.
  • An old Codex session with a stale .codex-app-server.json binding completed a real turn, replied with the expected marker, and completed post-turn compaction.
  • A session seeded with a stale agentHarnessId: "codex" but configured back onto the PI runtime completed a real turn as openai/gpt-5.5, reported/stored agentHarnessId: pi, replied with the expected marker, and did not emit Codex-native compaction logs.

Important nuance: the two Codex binding upgrade lanes are true product-entrypoint upgrade smokes, but they do not force the internal missing/stale-binding fallback branch during post-turn compaction. In a normal live Codex turn, the runtime repairs or recreates the app-server thread binding before the post-turn compaction lifecycle sees it. So those lanes prove that old sessions survive live upgrade behavior, while the deterministic unit tests remain the proof for the exact missing/stale native-compaction fallback branches.

@joshavant joshavant force-pushed the fix/codex-compaction-routing-84305 branch from b06072c to da4331f Compare May 22, 2026 02:31
@joshavant joshavant merged commit b8e9ab9 into main May 22, 2026
99 checks passed
@joshavant joshavant deleted the fix/codex-compaction-routing-84305 branch May 22, 2026 02:41
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(codex): surface native compaction failures

* docs: add changelog for codex compaction fix

* test: align compaction failure fixtures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex runtime allows >2M-token turns with compactionCount=0, then contextEngine maintenance fails

1 participant