fix(codex): keep interrupted turns visible-answer eligible#84494
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7405a31b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-22 10:17 UTC / May 22, 2026, 6:17 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. Current main source maps raw app-server PR rating What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the narrow projection change if maintainers accept the explicit-abort contract, and leave adjacent timeout-default and long-reply truncation behavior in their separate follow-ups. Do we have a high-confidence way to reproduce the issue? Yes. Current main source maps raw app-server Is this the best way to solve the issue? Yes, with the contract caveat. Removing only the blanket interrupted-to-aborted mapping while preserving explicit abort, timeout, abort-marker, and abort-controller paths is the narrowest maintainable fix I found. Label changes:
Label justifications:
Acceptance criteria:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 6bd430ee3517. |
e7405a3 to
16981a7
Compare
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Pearl Test Hopper Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
Live behavior proof for PR #84494 Commit under test: What was run: OPENCLAW_LIVE_TEST=1 \
OPENCLAW_LIVE_CODEX_HARNESS=1 \
OPENCLAW_LIVE_CODEX_HARNESS_AUTH=codex-auth \
OPENCLAW_LIVE_CODEX_HARNESS_CODE_MODE_ONLY=1 \
OPENCLAW_LIVE_CODEX_HARNESS_SUBAGENT_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_CHAT_IMAGE_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_REQUEST_TIMEOUT_MS=600000 \
timeout 1200s node scripts/test-live.mjs --codex-harness -- \
src/gateway/gateway-codex-harness.live.test.tsResult: This is the real OpenClaw gateway -> plugin-owned Codex app-server path. It shows:
Additional raw Codex app-server proof: This raw app-server proof exercised a no-output shell command ( Cancellation/interrupt proof: This confirms explicit interruption still remains distinguishable and does not synthesize a fake final answer. Crabbox status: I did not copy local Codex auth secrets to the lab VM. The prior focused regression test did pass through Crabbox on the SSH target: Security note: local paths, thread IDs, tool call IDs, account identity, and private lab host details were redacted from this public proof. |
|
@clawsweeper re-review Live behavior proof has been supplied in #84494 (comment). The Real behavior proof check is passing, and the proof includes gateway Codex app-server tool-followed-by-final-output plus explicit turn/interrupt behavior. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
16981a7 to
551ffd1
Compare
|
@clawsweeper re-review I added the exact diagnostic proof requested and pushed commit New proof in PR body covers:
Focused local proof passed: |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
551ffd1 to
2010177
Compare
|
@clawsweeper re-review I replaced the temporary cross-boundary diagnostic with the final proof layout and pushed commit Exact proof now covers:
Focused local proof passed and Crabbox SSH proof passed with the corrected configs; PR body has the exact commands and redacted output. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review I added exact runtime proof to the PR body under |
|
CI status update after exact runtime proof:
A maintainer rerun of the failed jobs is needed, or a maintainer can classify these CI failures as unrelated to this PR. |
|
maintainer follow-up after reviewing this with the wider Codex app-server stack: #83200/#83222, #83476, #84135/#84974, #84137, #84492, and #84516. I rebased this PR onto current proof:
scope boundary: this is the #84492 interrupted/tool-only visibility fix. It should not be treated as closing #84516's long-reply/truncated-final-payload behavior or the #84137 post-tool raw assistant semantic decision. |
2bff2ae to
48cc43a
Compare
48cc43a to
a5d1958
Compare
a5d1958 to
256cc5f
Compare
256cc5f to
90fb636
Compare
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…84494) * fix(codex): keep interrupted turns visible-answer eligible * docs(changelog): note codex interrupted recovery --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Summary
turn.status: "interrupted"was projected as an OpenClaw abort even when OpenClaw did not explicitly cancel the run.abortedunless OpenClaw explicitly calledmarkAborted()ormarkTimedOut().Motivation
interruptedterminal status was mapped toaborted: true.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
origin/mainplus Crabbox static SSH Linux VM. Private host/user/workroot values were redacted.Root Cause (if applicable)
CodexAppServerEventProjectorconvertedturn.status === "interrupted"intothis.aborted = trueand returnedaborted: this.aborted || turnInterrupted.Regression Test Plan (if applicable)
extensions/codex/src/app-server/event-projector.test.tssrc/agents/pi-embedded-runner/run.incomplete-turn.test.tsinterrupteddoes not by itself become OpenClawaborted.User-visible / Behavior Changes
User-facing Codex turns that end with app-server
interruptedbut no explicit OpenClaw cancellation can now reach the existing no-visible-answer guard instead of being classified as aborted.Diagram (if applicable)
Security Impact (required)
Yes/No): NoYes/No): NoYes/No): NoYes/No): NoYes/No): NoYes, explain risk + mitigation: N/ARepro + Verification
Environment
v24.15.0locally and on the VMSteps
origin/main.Expected
interrupteddid not automatically become OpenClawaborted.Actual
Evidence
Local focused tests after rebase:
Human Verification (required)
main.markAborted()/markTimedOut()paths remained separate.promptError.Review Conversations
Compatibility / Migration
Yes/No): YesYes/No): NoYes/No): NoRisks and Mitigations
interruptedstatuses may correspond to genuine app-server-side interruptions.aborted/promptError.Exact interrupted tool-only recovery proof
Added diagnostic regression coverage at commit
2010177bc7for the exact path ClawSweeper requested:turn/completed status=interrupted.commandExecutionwith empty aggregated output andexitCode=0.assistantTexts=[].aborted=false,timedOut=false, bash tool metadata preserved.couldn't generate a response.nullfromresolveIncompleteTurnPayloadText(...), so user cancellation does not synthesize a fake final answer.Focused proof commands run locally:
Crabbox SSH proof on the lab VM also passed for the exact diagnostic pair:
Exact runtime interrupted tool-only recovery proof
After the focused tests, I also ran a runtime diagnostic through the real OpenClaw modules from commit
2010177bc7rather than a Vitest assertion-only path. The script instantiatedCodexAppServerEventProjector, fed a redacted Codex app-serverturn/completednotification withstatus="interrupted", one successfulcommandExecutionitem,aggregatedOutput="", andexitCode=0, then passed the projected attempt intoresolveIncompleteTurnPayloadText(...).Redacted local runtime output:
Redacted Crabbox static SSH runtime output from the lab VM:
This is the exact after-fix runtime path requested by ClawSweeper: app-server
interrupted, tool-only/no-assistant, no explicit OpenClaw abort, no timeout, no synthesized assistant answer, and the existing no-visible-answer recovery text remained reachable. The explicit-cancellation counter-proof stayed distinct: aftermarkAborted(), the same interrupted/tool-only shape projectedaborted=trueand producedrecovery=null.