Fail Codex compaction at the Codex boundary#85958
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 24, 2026, 11:27 PM ET / 03:27 UTC. Summary PR surface: Source -556, Tests -637, Docs +5. Total -1188 across 18 files. Reproducibility: yes. Source inspection on current main shows the OpenClaw-owned Codex preflight, context-engine, wait/restart, and fallback compaction paths, and the PR body provides live after-fix Gateway/Codex proof. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Make an explicit maintainer decision on the Codex compaction ownership boundary; if fail-closed native Codex ownership is intended, this PR is the coherent path, otherwise preserve compatibility and land a narrower routing fix such as #86292. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows the OpenClaw-owned Codex preflight, context-engine, wait/restart, and fallback compaction paths, and the PR body provides live after-fix Gateway/Codex proof. Is this the best way to solve the issue? Likely yes if maintainers agree with the boundary decision. The patch consistently removes duplicate OpenClaw-owned Codex compaction and aligns docs/tests around native pending starts, but the fail-closed compatibility and availability impact needs owner approval. Codex review notes: model gpt-5.5, reasoning high; reviewed against 3db1508f1ee7. Label changesLabel justifications:
Evidence reviewedPR surface: Source -556, Tests -637, Docs +5. Total -1188 across 18 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Frosted Clawlet Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
@clawsweeper re-review Addressed the CLI fallback gap: Codex native-harness compaction now fails closed in Validation rerun on the new head: |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
c39923d to
83e5f86
Compare
|
@clawsweeper re-review Updated this PR to leave automatic Codex-runtime compaction to Codex itself. OpenClaw now skips reply preflight and CLI automatic compaction for Codex runtime sessions, removes app-server context-engine forced compaction before/after Codex turn overflow handling, and still forwards explicit/manual/plugin compaction requests to native Codex Exact-head Real behavior proof is green on |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review I added one more live proof point after the pending-state fix: after the real Gateway |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
Summary
Codex runtime compaction was still being treated like an OpenClaw-owned PI safeguard in a few places. OpenClaw could preflight-compact Codex reply turns, post-turn compact Codex CLI transcripts, let a context-engine plugin replace Codex app-server compaction, and force plugin compaction before or after Codex
turn/startoverflow handling.That is the wrong boundary. Codex already owns automatic compaction inside the Codex runtime. OpenClaw should not predict when a Codex thread needs compaction and then rewrite the transcript around it. At best that duplicates the native harness. At worst it crosses runtime/auth boundaries, restarts the shared app-server, or drifts a Codex-backed OpenAI session into OpenClaw/public Responses summarization.
This tightens the contract. Automatic compaction for Codex runtime sessions is native Codex-owned. OpenClaw now skips its preflight and CLI automatic compaction paths for Codex, removes the Codex app-server context-engine forced-compaction paths, and no longer lets an
ownsCompactioncontext engine take over Codex app-server compaction. Explicit compaction is still supported:/compactand plugin-requested/manual compaction requests are forwarded to Codex as nativethread/compact/start, then OpenClaw returns immediately instead of waiting, timing out, restarting the shared app-server, or falling back to another summarizer.The fail-closed behavior stays where it belongs. If native Codex compaction cannot be started because the native thread binding is missing or stale, OpenClaw reports that failure honestly. It does not restart the shared Codex app-server as a retry strategy, and it does not fall back to OpenClaw/public OpenAI compaction for Codex runtime sessions. Non-Codex pinned harnesses still keep their previous context-engine fallback path. The Codex harness docs now state that Codex owns native compaction and that context engines do not replace it.
Real behavior proof
Behavior addressed: Codex-runtime sessions should not be automatically compacted by OpenClaw preflight, CLI, context-engine, or public OpenAI summarization paths. Explicit/manual/plugin compaction should only start native Codex compaction through
thread/compact/start, without OpenClaw waiting on completion, imposing its own timeout, restarting the shared app-server, counting the pending start as completed OpenClaw compaction, or falling back to another compaction backend.Real environment tested: Pash's local dev agent was restarted from this branch at head
4b9b8ce0c065ce2e1cda49c027e4f3e2c0e3a295. Gateway reported2026.5.24running from/Users/pash/code/openclaw/dist/index.js, pid56665, healthy loopback on127.0.0.1:18789. The live proof used the realmainagent with Codex OAuth auth redacted, provideropenai-codex, modelgpt-5.5, and the embedded Codex app-server harness.Exact steps or command run after this patch:
pnpm openclaw gateway restart;pnpm openclaw gateway status --deep;pnpm openclaw agent --agent main --session-key agent:main:codex-compaction-proof-85958-final-1779663763 --message "Final live proof turn 1 for PR 85958 on commit 81328167c4. Reply exactly: FINAL_PROOF_TURN_1_OK" --thinking minimal --timeout 600 --json;pnpm openclaw agent --agent main --session-key agent:main:codex-compaction-proof-85958-final-1779663763 --message "Final live proof turn 2 for PR 85958 on the same session. Reply exactly: FINAL_PROOF_TURN_2_OK" --thinking minimal --timeout 600 --json; after a second gateway restart on the latest head,pnpm openclaw agent --agent main --session-key agent:main:codex-compaction-proof-85958-pending-1779668199 --message "Live pending compact proof setup for PR 85958 on branch head 4b9b8ce0c0. Reply exactly: PENDING_PROOF_SETUP_OK" --thinking minimal --timeout 600 --json; a real Gatewaysessions.compactRPC for that scratch session throughsrc/gateway/call.ts; session-store before/after comparison of/Users/pash/.openclaw/agents/main/sessions/sessions.json; a follow-up live turn on the same scratch session after the pending native compact-start signal withpnpm openclaw agent --agent main --session-key agent:main:codex-compaction-proof-85958-pending-1779668199 --message "Post pending native compact-start live proof for PR 85958. Reply exactly: PENDING_PROOF_AFTER_COMPACT_OK" --thinking minimal --timeout 600 --json;rg -n 'thread/compact/start|preflight_compacting|falling back to context engine|Summarization failed|provider_error_4xx|"api":"openai-responses"|"modelApi":"openai-responses"|contextCompaction|gateway_timeout|EMBEDDED FALLBACK' /Users/pash/.openclaw/agents/main/sessions/e15cac51-53a1-440a-b0bf-4200389bfc70*;rg -n 'thread/compact|context_compacted|contextCompaction|Summarization failed|api.responses.write|openai-responses|Incorrect API key|falling back to context engine' /Users/pash/.openclaw/agents/main/agent/codex-home/sessions/2026/05/24/rollout-2026-05-24T17-16-47-019e5c7d-c8de-7570-b60b-96997e8da05d.jsonl;pnpm test src/gateway/server.sessions.compaction.test.ts src/auto-reply/reply/commands-compact.test.ts extensions/codex/src/app-server/compact.test.ts -- --reporter=dot;pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/compact.ts extensions/codex/src/app-server/compact.test.ts src/auto-reply/reply/commands-compact.test.ts src/gateway/server.sessions.compaction.test.ts;git diff --check;pnpm check:changed;pnpm build.Evidence after fix: Normal live turns stayed on the same Codex runtime session. Turn 1 returned exactly
FINAL_PROOF_TURN_1_OKwith run id4a70de03-f779-488b-99d4-b2e4e362ea5b, session7b1d9666-d817-4182-b3a4-43ca985d530f, Codex thread019e5c3a-2533-7321-845e-44a55ab1545b, provideropenai-codex, modelgpt-5.5,agentHarnessId: codex,fallbackUsed: false, and runnerembedded. Turn 2 returned exactlyFINAL_PROOF_TURN_2_OKwith run iddfbb3821-57c6-44e7-972e-47e66a71f3eeand the same session/provider/model/harness. The latest-head scratch setup turn returned exactlyPENDING_PROOF_SETUP_OKwith run id4a420ac5-9a08-42f8-9054-c8a67adefec9, sessione15cac51-53a1-440a-b0bf-4200389bfc70, Codex thread019e5c7d-c8de-7570-b60b-96997e8da05d, provideropenai-codex, modelgpt-5.5,agentHarnessId: codex,fallbackUsed: false, runnerembedded, andpromptTokens: 46541. The real Gatewaysessions.compactcall returnedok: true,compacted: false,tokensAfter: null, and details{ "backend": "codex-app-server", "threadId": "019e5c7d-c8de-7570-b60b-96997e8da05d", "signal": "thread/compact/start", "pending": true }. The session store was unchanged immediately after the pending native compact start:compactionCountremained unset,totalTokensremained46541, andtotalTokensFreshremainedtrue. The Gateway log recordedstarted codex app-server compactionfor the same session and thread. The follow-up turn after that compact-start signal returned exactlyPENDING_PROOF_AFTER_COMPACT_OKwith run id3fca6dbc-9bab-4d8d-847b-250138764fda, the same session id, provideropenai-codex, modelgpt-5.5,agentHarnessId: codex,fallbackUsed: false, and runnerembedded. After that normal turn, the session store still hadcompactionCountunset and updated only normal token bookkeeping tototalTokens: 50857. The Codex rollout recorded the expected nativecontext_compactedevent for thread019e5c7d-c8de-7570-b60b-96997e8da05d, showing Codex completed its own compaction path after OpenClaw started it. The session-file grep found no OpenClaw preflight compaction, context-engine fallback, publicopenai-responsesmetadata, Gateway timeout fallback, or embedded fallback artifacts. The Codex rollout grep found no Codex compaction failure, public OpenAI Responses summarization,api.responses.writefailure, or context-engine fallback strings.Observed result after fix: The live dev agent stayed on the Codex runtime for normal turns and explicit compact-start proof, resumed Codex-backed scratch sessions correctly after Gateway restarts, and did not drift into OpenClaw/public Responses compaction. Explicit compaction now behaves as a native Codex start signal: OpenClaw forwards
thread/compact/start, reports the start as pending rather than completed, and leaves compaction counters alone while Codex owns the async result internally. The tests lock this down across the app-server, channel/compact, and Gatewaysessions.compactseams so future changes do not reintroduce OpenClaw-owned Codex compaction fallbacks or completed-state bookkeeping for pending native starts.What was not tested: I did not force a destructive live compaction timeout on Pash's running personal dev-agent conversation, and I did not trigger a human-authored Discord or Telegram
/compactmessage from a live chat client. The actual Gatewaysessions.compactpath was exercised against a real Codex-backed scratch session on the running dev agent, and the channel command behavior is covered by focused tests around the same pending native result shape.