Skip to content

fix(codex): keep interrupted turns visible-answer eligible#84494

Merged
vincentkoc merged 2 commits into
openclaw:mainfrom
rozmiarD:fix/codex-app-server-interrupted-turns
May 22, 2026
Merged

fix(codex): keep interrupted turns visible-answer eligible#84494
vincentkoc merged 2 commits into
openclaw:mainfrom
rozmiarD:fix/codex-app-server-interrupted-turns

Conversation

@rozmiarD

@rozmiarD rozmiarD commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: Codex app-server turn.status: "interrupted" was projected as an OpenClaw abort even when OpenClaw did not explicitly cancel the run.
  • Solution: Keep app-server interrupted as terminal app-server state, but do not map it to OpenClaw aborted unless OpenClaw explicitly called markAborted() or markTimedOut().
  • What changed: Added projector and runner regressions for interrupted/tool-only/no-visible-answer Codex turns, including sparse successful bash output.
  • What did NOT change (scope boundary): This did not treat every interrupted turn as success, did not synthesize final assistant text, did not disable cancellation, and did not change failed-turn error handling.

Motivation

  • Codex-backed dashboard turns can appear to stop after tool output without delivering the final visible assistant answer.
  • The existing incomplete-turn/no-visible-answer guard was bypassed because the app-server interrupted terminal status was mapped to aborted: true.
  • This materially affects UX for Codex-backed OpenClaw usage: the user sees tool-only progress and then needs a continuation turn instead of receiving the answer that the original user-facing turn owed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Codex app-server interrupted turn status no longer suppresses no-visible-answer handling for user-facing turns.
  • Real environment tested: Local OpenClaw dev checkout rebased on origin/main plus Crabbox static SSH Linux VM. Private host/user/workroot values were redacted.
  • Exact steps or command run after this patch:
crabbox run --provider ssh --target linux --static-host <redacted-lab-host> \
  --static-user <redacted-user> --static-port 22 \
  --static-work-root <redacted-workroot> --shell -- \
  'export PATH="$HOME/.local/bin:$PATH"; node --version; pnpm --version;
   pnpm install --frozen-lockfile --ignore-scripts;
   CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts \
     extensions/codex/src/app-server/event-projector.test.ts \
     -t "app-server interrupted status|sparse successful bash output";
   CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts \
     src/agents/pi-embedded-runner/run.incomplete-turn.test.ts \
     -t "sparse bash output"'
  • Evidence after fix:
event-projector.test.ts: Test Files 1 passed; Tests 2 passed | 58 skipped
run.incomplete-turn.test.ts: Test Files 1 passed; Tests 1 passed | 95 skipped
run summary sync=2.536s command=32.891s total=35.452s exit=0
lease cleanup stopped=true policy=auto
  • Observed result after fix: The focused interrupted-turn and sparse-bash regressions passed remotely through Crabbox.
  • What was not tested: A full dashboard browser interaction was not run. Full monorepo tests were not run.
  • Before evidence:
FAIL extensions/codex/src/app-server/event-projector.test.ts > CodexAppServerEventProjector > does not treat app-server interrupted status as a user cancellation by itself
AssertionError: expected true to be false

Root Cause (if applicable)

  • Root cause: CodexAppServerEventProjector converted turn.status === "interrupted" into this.aborted = true and returned aborted: this.aborted || turnInterrupted.
  • Missing detection / guardrail: There was no regression covering an app-server interrupted terminal turn with no explicit OpenClaw cancellation and no visible assistant answer.
  • Contributing context (if known): The embedded runner intentionally skips incomplete-turn/no-visible-answer handling for true aborts. That made the projector's over-broad status mapping user-visible.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • extensions/codex/src/app-server/event-projector.test.ts
    • src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
  • Scenario the test should lock in:
    • App-server interrupted does not by itself become OpenClaw aborted.
    • Sparse successful bash output plus no visible final assistant text remains eligible for no-visible-answer handling.
    • Replay-unsafe shell activity surfaces a verification warning instead of silently retrying or faking an answer.
  • Why this is the smallest reliable guardrail: The bug was a local projection/lifecycle mapping problem plus runner incomplete-turn classification; the focused tests cover both without live model nondeterminism.
  • Existing test that already covers this (if any): None found.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

User-facing Codex turns that end with app-server interrupted but no explicit OpenClaw cancellation can now reach the existing no-visible-answer guard instead of being classified as aborted.

Diagram (if applicable)

Before:
Codex turn/completed(status=interrupted) -> projector aborted=true -> incomplete-turn guard skipped -> no final visible answer handling

After:
Codex turn/completed(status=interrupted) -> projector aborted unchanged -> incomplete-turn guard remains active -> retry/error handling stays visible

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): No
  • New/changed network calls? (Yes/No): No
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux x64 local dev and Linux x64 Crabbox SSH VM
  • Runtime/container: Node v24.15.0 locally and on the VM
  • Model/provider: Codex app-server/OpenAI Codex path modeled by focused projector tests
  • Integration/channel (if any): OpenClaw Codex app-server harness
  • Relevant config (redacted): Crabbox static SSH provider with redacted host/user/workroot

Steps

  1. Rebased the branch on latest official origin/main.
  2. Ran the focused local test files.
  3. Ran the focused regression selections remotely through Crabbox static SSH.

Expected

  • App-server interrupted did not automatically become OpenClaw aborted.
  • Sparse successful bash output did not count as a final visible assistant answer.
  • Replay-unsafe tool activity surfaced the existing verification warning path.

Actual

  • Matched expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Local focused tests after rebase:

CI=1 timeout 240s node scripts/run-vitest.mjs run \
  --config test/vitest/vitest.extensions.config.ts \
  extensions/codex/src/app-server/event-projector.test.ts

Test Files  1 passed (1)
Tests  60 passed (60)
CI=1 timeout 240s node scripts/run-vitest.mjs run \
  --config test/vitest/vitest.agents-pi-embedded.config.ts \
  src/agents/pi-embedded-runner/run.incomplete-turn.test.ts

Test Files  1 passed (1)
Tests  96 passed (96)

Human Verification (required)

  • Verified scenarios:
    • Pre-fix focused regression failed on current main.
    • Local focused files passed after the patch and after rebase.
    • Crabbox SSH remote proof passed after rebase.
  • Edge cases checked:
    • Explicit markAborted()/markTimedOut() paths remained separate.
    • Failed turns still set promptError.
    • Replay-unsafe sparse bash output surfaced a verification warning instead of being silently retried.
  • What was not verified:
    • Full dashboard browser UX.
    • Full monorepo test suite.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): No
  • Migration needed? (Yes/No): No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Some app-server interrupted statuses may correspond to genuine app-server-side interruptions.
    • Mitigation: The patch does not mark them successful and does not fake a reply; it only avoids treating them as OpenClaw/user aborts. Explicit OpenClaw abort and timeout paths still set aborted/promptError.

Exact interrupted tool-only recovery proof

Added diagnostic regression coverage at commit 2010177bc7 for the exact path ClawSweeper requested:

  • Codex app-server terminal projection: turn/completed status=interrupted.
  • Prior turn item: successful bash commandExecution with empty aggregated output and exitCode=0.
  • No assistant text was projected: assistantTexts=[].
  • OpenClaw projection after the patch: aborted=false, timedOut=false, bash tool metadata preserved.
  • The no-visible-answer guard was exercised with a projector-compatible interrupted tool-only attempt and produced recovery text containing couldn't generate a response.
  • Explicit cancellation counter-proof remained distinct: an explicitly aborted interrupted/tool-only attempt returned null from resolveIncompleteTurnPayloadText(...), so user cancellation does not synthesize a fake final answer.

Focused proof commands run locally:

CI=1 timeout 240s /home/probo/.local/share/pnpm/store/v11/links/@/node/24.15.0/73ce00437ab4f9c458daa7174613927ead616d9a77c35c8b15545d4dc2ac94a0/node_modules/node/bin/node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/event-projector.test.ts
# Test Files 1 passed (1)
# Tests 61 passed (61)

CI=1 timeout 240s /home/probo/.local/share/pnpm/store/v11/links/@/node/24.15.0/73ce00437ab4f9c458daa7174613927ead616d9a77c35c8b15545d4dc2ac94a0/node_modules/node/bin/node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts -t "app-server interrupted tool-only output"
# Test Files 1 passed (1)
# Tests 1 passed | 96 skipped (97)

CI=1 timeout 240s /home/probo/.local/share/pnpm/store/v11/links/@/node/24.15.0/73ce00437ab4f9c458daa7174613927ead616d9a77c35c8b15545d4dc2ac94a0/node_modules/node/bin/node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "keeps upstream cancellation aborted"
# Test Files 1 passed (1)
# Tests 1 passed | 185 skipped (186)

Crabbox SSH proof on the lab VM also passed for the exact diagnostic pair:

CRABBOX_CONFIG=/home/probo/.config/crabbox/lab-ssh.yaml CRABBOX_SSH_KEY=/home/probo/.ssh/crabbox_lab_ed25519 timeout 1200s /home/probo/.local/bin/crabbox run --provider ssh --target linux --static-host [redacted] --static-user [redacted] --static-port 22 --static-work-root /home/[redacted]/crabbox --shell -- 'export PATH="$HOME/.local/bin:$PATH"; CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/event-projector.test.ts -t "sparse successful bash output|explicit cancellation marked aborted"; CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts -t "app-server interrupted tool-only output"'
# event-projector: Test Files 1 passed (1); Tests 2 passed | 59 skipped (61)
# incomplete-turn: Test Files 1 passed (1); Tests 1 passed | 96 skipped (97)
# lease cleanup stopped=true

Exact runtime interrupted tool-only recovery proof

After the focused tests, I also ran a runtime diagnostic through the real OpenClaw modules from commit 2010177bc7 rather than a Vitest assertion-only path. The script instantiated CodexAppServerEventProjector, fed a redacted Codex app-server turn/completed notification with status="interrupted", one successful commandExecution item, aggregatedOutput="", and exitCode=0, then passed the projected attempt into resolveIncompleteTurnPayloadText(...).

Redacted local runtime output:

[runtime-proof] event=turn/completed status=interrupted item=commandExecution command=bash aggregatedOutputLength=0 exitCode=0
[runtime-proof] projected aborted=false timedOut=false assistantTextCount=0 toolMetas=[{"toolName":"bash","meta":"ps -eo pid,ppid,stat,cmd | rg 'venv-roadmap|pytest|run_security_contract_validation|validate_public_install|git push|ap… (workspace)"}]
[runtime-proof] noVisibleAnswerRecovery="⚠️ Agent couldn't generate a response. Please try again."
[runtime-proof] explicitCancellation projectedAborted=true recovery=null

Redacted Crabbox static SSH runtime output from the lab VM:

[runtime-proof] event=turn/completed status=interrupted item=commandExecution command=bash aggregatedOutputLength=0 exitCode=0
[runtime-proof] projected aborted=false timedOut=false assistantTextCount=0 toolMetas=[{"toolName":"bash","meta":"ps -eo pid,ppid,stat,cmd | rg 'venv-roadmap|pytest|run_security_contract_validation|validate_public_install|git push|ap… (workspace)"}]
[runtime-proof] noVisibleAnswerRecovery="⚠️ Agent couldn't generate a response. Please try again."
[runtime-proof] explicitCancellation projectedAborted=true recovery=null
command complete in 25.156s total=27.671s
run summary sync=2.496s command=25.156s total=27.671s sync_skipped=false exit=0
lease cleanup stopped=true policy=auto

This is the exact after-fix runtime path requested by ClawSweeper: app-server interrupted, tool-only/no-assistant, no explicit OpenClaw abort, no timeout, no synthesized assistant answer, and the existing no-visible-answer recovery text remained reachable. The explicit-cancellation counter-proof stayed distinct: after markAborted(), the same interrupted/tool-only shape projected aborted=true and produced recovery=null.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 20, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7405a31b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/codex/src/app-server/event-projector.ts
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Latest ClawSweeper review: 2026-05-22 10:17 UTC / May 22, 2026, 6:17 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch stops mapping raw Codex app-server interrupted turns to OpenClaw aborted, adds focused projector/runner regressions, and adds a changelog entry.

Reproducibility: yes. Current main source maps raw app-server interrupted to aborted, and the embedded runner's no-visible-answer guard skips aborted attempts; the PR body also includes a focused failing assertion from main.

PR rating
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Summary: Strong focused proof, narrow implementation, and no blocking findings leave only maintainer acceptance of the interrupted-status contract.

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body and comments include redacted after-fix live output for focused local tests, Crabbox SSH runs, and a live Codex harness path exercising the changed behavior.

Risk before merge

  • If a supported Codex app-server path emits raw interrupted for genuine out-of-band cancellation without OpenClaw abort-controller, timeout, or abort-marker evidence, users may now see no-visible-answer recovery instead of cancellation handling.

Maintainer options:

  1. Merge With Explicit-Abort Contract (recommended)
    Merge if maintainers accept that only OpenClaw abort, timeout, abort-marker, or abort-controller evidence should suppress visible-answer recovery for Codex turns.
  2. Require Interruption-Origin Signal
    Ask for an app-server interruption-origin signal first if raw interrupted must sometimes mean user cancellation in supported deployments without OpenClaw-side abort evidence.
  3. Pause For Upstream Semantics
    Hold the PR if maintainers need upstream confirmation that raw interrupted is overloaded in a way this patch would mishandle.

Next step before merge
No automated repair remains; the next action is maintainer merge/check gating with the explicit-abort contract acknowledged.

Security
Cleared: The diff changes Codex app-server state projection, focused tests, and changelog text only; it adds no dependency, permission, secret, network, install, or code-execution surface.

Review details

Best possible solution:

Land the narrow projection change if maintainers accept the explicit-abort contract, and leave adjacent timeout-default and long-reply truncation behavior in their separate follow-ups.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source maps raw app-server interrupted to aborted, and the embedded runner's no-visible-answer guard skips aborted attempts; the PR body also includes a focused failing assertion from main.

Is this the best way to solve the issue?

Yes, with the contract caveat. Removing only the blanket interrupted-to-aborted mapping while preserving explicit abort, timeout, abort-marker, and abort-controller paths is the narrowest maintainable fix I found.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments include redacted after-fix live output for focused local tests, Crabbox SSH runs, and a live Codex harness path exercising the changed behavior.

Label justifications:

  • P1: The PR fixes a user-facing Codex agent path where tool-only interrupted turns can suppress visible no-answer recovery.
  • merge-risk: 🚨 message-delivery: Changing interrupted terminal-turn classification can change whether a user sees cancellation handling or no-visible-answer recovery for a Codex turn.
  • rating: 🦞 diamond lobster: Current PR rating is 🦞 diamond lobster because proof is 🦞 diamond lobster, patch quality is 🦞 diamond lobster, and Strong focused proof, narrow implementation, and no blocking findings leave only maintainer acceptance of the interrupted-status contract.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body and comments include redacted after-fix live output for focused local tests, Crabbox SSH runs, and a live Codex harness path exercising the changed behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments include redacted after-fix live output for focused local tests, Crabbox SSH runs, and a live Codex harness path exercising the changed behavior.

Acceptance criteria:

  • CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/event-projector.test.ts
  • CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "keeps upstream cancellation aborted"
  • CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts -t "app-server interrupted tool-only output|sparse bash output"

What I checked:

Likely related people:

  • Peter Steinberger: Blame for the current interrupted projection and incomplete-turn helper lines points to 5ed8bbc69468de6cf48a5c874876b282957fef76, and earlier history shows he added and maintained the Codex app-server lifecycle surface. (role: introduced behavior and recent area contributor; confidence: high; commits: 31a0b7bd42a5, 545490c5920d, 5ed8bbc69468; files: extensions/codex/src/app-server/event-projector.ts, extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run/incomplete-turn.ts)
  • Vincent Koc: He carried adjacent Codex app-server timeout/config work, rebased this PR, and left the maintainer stack note separating this interrupted-turn fix from related timeout and truncation work. (role: recent stack reviewer and adjacent owner; confidence: high; commits: 60d200f79719, f1cc8f0cfc7c, 859eb0666282; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/config.ts, docs/plugins/codex-harness.md)
  • rozmiarD: Beyond authoring this PR, the same contributor authored merged Codex app-server dynamic-tool diagnostics work in 1912be8619fbd874e27c67b1e967161b273cff18, which touches the same run-attempt lifecycle stack. (role: recent adjacent contributor; confidence: medium; commits: 1912be8619fb; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6bd430ee3517.

@rozmiarD rozmiarD force-pushed the fix/codex-app-server-interrupted-turns branch from e7405a3 to 16981a7 Compare May 20, 2026 08:08
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Pearl Test Hopper

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: sniffs out flaky tests.
Image traits: location CI tidepool; accessory lint brush; palette cobalt, lime, and pearl; mood calm; pose curling around a status light; shell matte ceramic shell; lighting cool dashboard glow; background soft code-shaped tiles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Pearl Test Hopper in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@rozmiarD

Copy link
Copy Markdown
Contributor Author

Live behavior proof for PR #84494

Commit under test: 16981a7912a1b3a766810a7837f1d67aeb5a2b1e

What was run:

OPENCLAW_LIVE_TEST=1 \
OPENCLAW_LIVE_CODEX_HARNESS=1 \
OPENCLAW_LIVE_CODEX_HARNESS_AUTH=codex-auth \
OPENCLAW_LIVE_CODEX_HARNESS_CODE_MODE_ONLY=1 \
OPENCLAW_LIVE_CODEX_HARNESS_SUBAGENT_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_CHAT_IMAGE_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0 \
OPENCLAW_LIVE_CODEX_HARNESS_REQUEST_TIMEOUT_MS=600000 \
timeout 1200s node scripts/test-live.mjs --codex-harness -- \
  src/gateway/gateway-codex-harness.live.test.ts

Result:

[gateway-codex-live] client-connected
[gateway-codex-live] agent-event {"stream":"codex_app_server.lifecycle","data":{"phase":"thread_ready","threadId":"[redacted]"}}
[gateway-codex-live] agent-event {"stream":"codex_app_server.lifecycle","data":{"phase":"turn_starting","threadId":"[redacted]"}}
[gateway-codex-live] agent-event {"stream":"assistant","data":{"text":"CODEX-HARNESS-E97A46"}}
[gateway-codex-live] first-turn {"firstText":"CODEX-HARNESS-E97A46"}

[gateway-codex-live] agent-event {"stream":"assistant","data":{"text":"CODEX-HARNESS-RESUME-78E5BE"}}
[gateway-codex-live] second-turn {"secondText":"CODEX-HARNESS-RESUME-78E5BE"}

[gateway-codex-live] code-mode-only-tool-probe:start {"sessionKey":"agent:dev:live-codex-harness"}
[gateway-codex-live] agent-event {"stream":"tool","data":{"phase":"start","name":"sessions_list","toolCallId":"[redacted]","meta":"1","args":{"limit":1,"includeLastMessage":false}}}
[gateway-codex-live] agent-event {"stream":"tool","data":{"phase":"result","name":"sessions_list","toolCallId":"[redacted]","meta":"1","isError":false,"result":{"success":true}}}
[gateway-codex-live] agent-event {"stream":"codex_app_server.item","data":{"phase":"completed","itemId":"[redacted]","type":"dynamicToolCall"}}
[gateway-codex-live] agent-event {"stream":"codex_app_server.item","data":{"phase":"started","itemId":"[redacted]","type":"agentMessage"}}
[gateway-codex-live] agent-event {"stream":"codex_app_server.item","data":{"phase":"completed","itemId":"[redacted]","type":"agentMessage"}}
[gateway-codex-live] agent-event {"stream":"assistant","data":{"text":"CODEX-CODEMODE-TOOL-2A2648"}}
[gateway-codex-live] agent-event {"stream":"lifecycle","data":{"phase":"end"}}
[gateway-codex-live] code-mode-only-tool-probe:done

Test Files  1 passed (1)
Tests  1 passed | 1 skipped (2)

This is the real OpenClaw gateway -> plugin-owned Codex app-server path. It shows:

  • a normal Codex turn produced a visible assistant response,
  • a resumed turn produced a new visible assistant response without restarting the session,
  • a tool-only phase (sessions_list, isError=false) was followed by a final assistant message,
  • the run reached lifecycle end and Vitest passed.

Additional raw Codex app-server proof:

PROOF_MODE=tool timeout 300s node /tmp/openclaw-upstream-drafts/codex-appserver-live-proof.mjs

notification {"method":"item/started","type":"commandExecution"}
notification {"method":"item/completed","type":"commandExecution"}
notification {"method":"rawResponseItem/completed","type":"function_call_output"}
notification {"method":"item/started","type":"agentMessage"}
notification {"method":"item/completed","type":"agentMessage"}
notification {"method":"turn/completed","status":"completed"}
summary {"mode":"tool","completedStatus":"completed","sawTurnCompleted":true,"sawAgentMessage":true,"assistantContainsToken":true}

This raw app-server proof exercised a no-output shell command (/bin/bash -lc true) followed by a final assistant message.

Cancellation/interrupt proof:

PROOF_MODE=interrupt timeout 180s node /tmp/openclaw-upstream-drafts/codex-appserver-live-proof.mjs

interrupting turn {"threadId":"[redacted]","turnId":"[redacted]"}
notification {"method":"turn/completed","status":"interrupted"}
summary {"mode":"interrupt","completedStatus":"interrupted","sawTurnCompleted":true,"sawAgentMessage":false,"assistantContainsToken":false}

This confirms explicit interruption still remains distinguishable and does not synthesize a fake final answer.

Crabbox status:

Crabbox SSH target check:
REMOTE_CODEX_AUTH_MISSING
node v24.15.0
pnpm 11.1.0
lease cleanup stopped=true

I did not copy local Codex auth secrets to the lab VM. The prior focused regression test did pass through Crabbox on the SSH target:

CI=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts \
  extensions/codex/src/app-server/run-attempt.test.ts \
  -t "keeps upstream cancellation aborted"

1 passed | 185 skipped
lease cleanup stopped=true

Security note: local paths, thread IDs, tool call IDs, account identity, and private lab host details were redacted from this public proof.

@rozmiarD

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Live behavior proof has been supplied in #84494 (comment). The Real behavior proof check is passing, and the proof includes gateway Codex app-server tool-followed-by-final-output plus explicit turn/interrupt behavior.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@rozmiarD rozmiarD force-pushed the fix/codex-app-server-interrupted-turns branch from 16981a7 to 551ffd1 Compare May 20, 2026 11:36
@rozmiarD

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

I added the exact diagnostic proof requested and pushed commit 551ffd124a.

New proof in PR body covers:

  • app-server turn/completed status=interrupted
  • successful tool-only bash commandExecution with empty output and exitCode=0
  • no assistant text projected
  • aborted=false, timedOut=false
  • resolveIncompleteTurnPayloadText(...) reaches the no-visible-answer recovery text
  • explicit cancellation counter-proof remains aborted=true and does not synthesize a final answer

Focused local proof passed: event-projector.test.ts 61 passed; cancellation regression 1 passed / 185 skipped.
Crabbox SSH proof passed for the exact diagnostic pair: 2 passed / 59 skipped, lease cleanup stopped=true.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@rozmiarD rozmiarD force-pushed the fix/codex-app-server-interrupted-turns branch from 551ffd1 to 2010177 Compare May 20, 2026 11:43
@rozmiarD

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

I replaced the temporary cross-boundary diagnostic with the final proof layout and pushed commit 2010177bc7.

Exact proof now covers:

  • projector test: app-server interrupted + successful empty bash output projects aborted=false, no assistant text, bash tool metadata preserved;
  • projector counter-test: explicit abort remains aborted=true for the same interrupted/tool-only shape;
  • core incomplete-turn test: a projector-compatible interrupted tool-only/no-assistant attempt reaches resolveIncompleteTurnPayloadText(...) and produces the no-visible-answer recovery text;
  • core counter-check: explicit cancellation returns null, so it does not synthesize a final answer.

Focused local proof passed and Crabbox SSH proof passed with the corrected configs; PR body has the exact commands and redacted output.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@rozmiarD

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

I added exact runtime proof to the PR body under exact-runtime-interrupted-tool-only-proof-2026-05-20. The proof runs the real OpenClaw projector and incomplete-turn recovery modules locally and through Crabbox SSH, showing turn/completed status=interrupted + successful empty bash commandExecution + no assistant text reaches no-visible-answer recovery, while explicit cancellation remains aborted=true and returns recovery=null.

@rozmiarD

Copy link
Copy Markdown
Contributor Author

CI status update after exact runtime proof:

  • Real behavior proof is passing after the new runtime proof section was added.
  • I attempted gh run rerun 26160301231 --failed, but GitHub rejected it with Must have admin rights to Repository, so I cannot rerun failed jobs from this account.
  • Sampled failed job logs point outside this PR diff:
    • checks-node-core-fast: src/security/windows-acl.test.ts failed on localized SYSTEM account classification.
    • checks-node-agentic-plugin-sdk: src/plugin-sdk/fs-safe-compat.test.ts failed with FsSafeError: Python helper is required for pinned writes on this platform.
    • checks-node-agentic-gateway-core: src/gateway/managed-image-attachments.test.ts failed while preparing image attachments.
  • The PR diff only touches Codex app-server projection and focused lifecycle tests, and the focused local + Crabbox runtime proof for that path passed.

A maintainer rerun of the failed jobs is needed, or a maintainer can classify these CI failures as unrelated to this PR.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 20, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 20, 2026
@vincentkoc vincentkoc self-assigned this May 21, 2026
@vincentkoc

Copy link
Copy Markdown
Member

maintainer follow-up after reviewing this with the wider Codex app-server stack: #83200/#83222, #83476, #84135/#84974, #84137, #84492, and #84516.

I rebased this PR onto current main and pushed the contributor branch to 2bff2ae444aae5ba8d13911309f64cf92af52f4c. Merge state is clean. The review thread about treating interrupted terminal turns as aborted is resolved after rechecking the cancellation split: explicit cancellation still stays aborted, while app-server interrupted/tool-only terminal turns remain visible-answer eligible.

proof:

  • focused Vitest interruption cases passed locally for event-projector, run-attempt, and pi-embedded-runner coverage.
  • Crabbox fresh PR focused run passed: run_ecb2930d5da9 / cbx_fd2b493bde91.
  • Blacksmith Testbox changed gate passed: tbx_01ks5f47e1wjjf8x4tv8tck15x, exit 0.

scope boundary: this is the #84492 interrupted/tool-only visibility fix. It should not be treated as closing #84516's long-reply/truncated-final-payload behavior or the #84137 post-tool raw assistant semantic decision.

@vincentkoc vincentkoc force-pushed the fix/codex-app-server-interrupted-turns branch from 2bff2ae to 48cc43a Compare May 22, 2026 03:31
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@vincentkoc vincentkoc force-pushed the fix/codex-app-server-interrupted-turns branch from 48cc43a to a5d1958 Compare May 22, 2026 08:52
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@vincentkoc vincentkoc force-pushed the fix/codex-app-server-interrupted-turns branch from a5d1958 to 256cc5f Compare May 22, 2026 09:21
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@vincentkoc vincentkoc force-pushed the fix/codex-app-server-interrupted-turns branch from 256cc5f to 90fb636 Compare May 22, 2026 10:11
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@vincentkoc vincentkoc merged commit 8523e09 into openclaw:main May 22, 2026
110 checks passed
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…84494)

* fix(codex): keep interrupted turns visible-answer eligible

* docs(changelog): note codex interrupted recovery

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex app-server interrupted turns can suppress no-visible-answer handling

2 participants