fix(codex): surface native compaction failures#85160
Conversation
|
Codex review: needs maintainer review before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. for the source-level path: current main emits terminal lifecycle before post-turn maintenance and swallows CLI compaction errors under transcript persistence, while #84305 provides production traces with over-window Codex turns and PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land only after maintainers accept the fail-closed Codex compaction semantics and are satisfied that the stale-binding fallback plus session-accounting coverage are enough for upgrade safety. Do we have a high-confidence way to reproduce the issue? Yes for the source-level path: current main emits terminal lifecycle before post-turn maintenance and swallows CLI compaction errors under transcript persistence, while #84305 provides production traces with over-window Codex turns and Is this the best way to solve the issue? Yes, conditionally: routing Codex sessions through native app-server compaction and delaying terminal lifecycle until durable maintenance completes is the right ownership boundary. The remaining question is maintainer acceptance of the fail-closed compatibility behavior. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against c8a35c4645dc. |
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Neon Branchling Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
Upgrade-path live smoke note from the follow-up testing: I ran these through the product entrypoint ( What the live lanes proved:
Important nuance: the two Codex binding upgrade lanes are true product-entrypoint upgrade smokes, but they do not force the internal missing/stale-binding fallback branch during post-turn compaction. In a normal live Codex turn, the runtime repairs or recreates the app-server thread binding before the post-turn compaction lifecycle sees it. So those lanes prove that old sessions survive live upgrade behavior, while the deterministic unit tests remain the proof for the exact missing/stale native-compaction fallback branches. |
b06072c to
da4331f
Compare
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
* fix(codex): surface native compaction failures * docs: add changelog for codex compaction fix * test: align compaction failure fixtures
Summary
Verification
node scripts/run-vitest.mjs extensions/codex/src/app-server/compact.test.ts src/agents/harness/selection.test.ts src/agents/command/cli-compaction.test.ts src/tui/tui-event-handlers.test.ts src/agents/agent-command.live-model-switch.test.ts— 8 files, 193 tests passed.node scripts/run-vitest.mjs extensions/codex/src/app-server/compact.test.ts src/agents/agent-command.live-model-switch.test.ts— 3 files, 65 tests passed after rebase conflict resolution.node scripts/run-vitest.mjs src/tui/tui-event-handlers.test.ts— 1 file, 50 tests passed for the post-final lifecycle regression.git diff --checkpassed.$autoreview:AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode auto --reviewer codex --fallback-reviewer none— clean, no accepted/actionable findings.Real behavior proof
Behavior addressed: Codex-native post-turn compaction failures are no longer silent; successful native compaction waits for completion and records fresh token/session state before the run becomes idle or allows the next local turn.
Real environment tested: Live OpenClaw TUI/local Codex runtime using
openai-codex/gpt-5.5, plus release-grade external smoke lanes for Discord, Slack, Telegram, and WhatsApp where credentials/infrastructure allowed.Exact steps or command run after this patch: Final focused fix-proof lane with small
contextTokensbudget drove successful Codex-native compaction across two TUI turns, then a forced native compaction timeout lane verified the user-visible failure path. External matrix artifacts:.artifacts/qa-e2e/issue-84305-release-smoke-2026-05-21T23-47-59-559Z/matrix-summary.jsonand.artifacts/qa-e2e/issue-84305-telegram-whatsapp-retry-2026-05-22T00-25-46-440Z/matrix-summary.json.Evidence after fix: Success lane logged
started codex app-server compactionandcompleted codex app-server compactionon both turns; session state showedcontextTokens: 4000,modelProvider: openai-codex,model: gpt-5.5,agentHarnessId: codex,totalTokensFresh: true, and compaction count incrementing to 2. Failure lane exited 1 withCLI native harness compaction failed for openai-codex/gpt-5.5: timed out waiting for codex app-server compaction..., with no auth/provider failure.Observed result after fix: Successful Codex-native compaction completed and persisted before idle; forced native compaction failure surfaced clearly instead of letting an over-budget session continue uncompacted. External smoke matrix product results: Codex Discord normal passed 2/2, Codex Discord forced-compaction passed 3/3, Codex Slack forced-compaction passed 7/7, Telegram normal retry passed 2/2; Telegram canary and WhatsApp were blocked by environment/credential/infrastructure limits rather than demonstrated branch regressions.
What was not tested: GitHub
Real behavior proofCI check is expected to be ignored for this PR per maintainer instruction. WhatsApp could not be product-validated because the available credential was logged out/401, and some Telegram canary retries timed out in infrastructure.Fixes #84305