Skip to content

Handle Codex turns missing completion#85107

Merged
joshavant merged 2 commits into
mainfrom
fix/codex-missing-turn-completed
May 21, 2026
Merged

Handle Codex turns missing completion#85107
joshavant merged 2 commits into
mainfrom
fix/codex-missing-turn-completed

Conversation

@joshavant

Copy link
Copy Markdown
Contributor

Summary

  • handle Codex app-server turns that stop before a matching turn/completed
  • mark missing-terminal timeouts replay-unsafe only from observed execution/delivery facts, not shell command parsing
  • preserve retry-safe outcomes for pre-execution dynamic-tool validation and blocked tool calls
  • release the session after the timeout so the next turn can run

Fixes #84076.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts extensions/codex/src/app-server/event-projector.test.ts -- --reporter=dot
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -- --reporter=dot
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/side-question.test.ts extensions/codex/src/conversation-turn-collector.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts -- --reporter=dot
  • git diff --check
  • AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode local
  • AWS Crabbox deterministic E2E: run run_05062ccbfbc4, provider aws, lease cbx_2de871a3b3ca, exit 0
  • AWS Crabbox live Codex provider sanity: run run_b218cba3016f, provider aws, lease cbx_3bdfe3ed80bf, exit 0

Real behavior proof

Behavior addressed: Codex app-server may produce evidence of work and then stop before OpenClaw receives a matching turn/completed; OpenClaw should not leave the session stuck or imply retry safety after possible side effects.

Real environment tested: Direct AWS Crabbox (provider=aws) with a built OpenClaw gateway and bundled Codex plugin; then a live Codex provider Docker harness using OPENCLAW_LIVE_CODEX_HARNESS_AUTH=api-key.

Exact steps or command run after this patch: Deterministic Crabbox E2E drove a real gateway through the Codex plugin with a fake app-server that completed a command item and withheld turn/completed, then sent a second turn. Live sanity ran pnpm test:docker:live-codex-harness on AWS with optional image/MCP/subagent/guardian probes disabled.

Evidence after fix: Deterministic run run_05062ccbfbc4 on lease cbx_2de871a3b3ca returned first turn status=timeout, aborted=true, replayInvalid=true, livenessState=abandoned, and the replay-unsafe timeout text; second turn returned status=ok and SECOND_RUN_OK. Live run run_b218cba3016f on lease cbx_3bdfe3ed80bf passed src/gateway/gateway-codex-harness.live.test.ts with 1 passed, 1 skipped.

Observed result after fix: Missing terminal confirmation after possible side effects produces an abandoned replay-invalid timeout and releases the session; normal live Codex app-server turns still complete and resume through the gateway.

What was not tested: A naturally occurring live provider omission of turn/completed; the deterministic E2E injects that boundary failure because live reproduction is not reliable on demand.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: L maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch teaches the Codex app-server and embedded runner to correlate nested turn completions, mark missing-terminal timeouts with side-effect-aware replay metadata, surface a specific timeout outcome, and adds focused tests plus a changelog entry.

Reproducibility: yes. The linked issue has live Discord/Telegram logs for item/completed followed by no turn/completed, and the PR supplies deterministic gateway proof of the same boundary; I did not execute tests in this read-only review.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: Strong proof and focused tests support the patch, while the changed session-state semantics still need maintainer acceptance before merge.

Rank-up moves:

  • Maintainer should explicitly accept the abandoned/replay-invalid behavior for missing terminal confirmation after observed Codex tool activity.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix AWS Crabbox deterministic gateway proof and live Codex provider sanity output with concrete run IDs and observed results.

Risk before merge

  • The patch changes session replay/liveness semantics for Codex turns with observed tool activity; maintainers should explicitly accept that missing terminal events after possible side effects become abandoned and replay-invalid instead of retry-safe.
  • A naturally occurring live turn/completed omission was not tested on demand; the proof uses deterministic injection for that boundary plus live provider sanity for normal Codex turns.

Maintainer options:

  1. Land With Explicit Replay Semantics (recommended)
    Maintainers can accept the intentional behavior that missing terminal confirmation after observed work marks the run abandoned and replay-invalid, supported by deterministic E2E and live sanity proof.
  2. Require Broader Live Transport Proof
    If maintainers want channel-level evidence before changing liveness semantics, ask for a redacted Discord or Telegram run showing the visible timeout status and subsequent turn recovery.

Next step before merge
No narrow automated repair is indicated; the protected label and changed replay/liveness semantics need maintainer merge judgment.

Security
Cleared: The diff does not add dependencies, workflows, credential handling, or new code-execution sources; the replay-safety changes are conservative for side-effecting tools.

Review details

Best possible solution:

Land the focused recovery path after maintainer review accepts the side-effect-aware replay semantics and keeps the deterministic Crabbox plus live Codex proof attached to the PR verification record.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked issue has live Discord/Telegram logs for item/completed followed by no turn/completed, and the PR supplies deterministic gateway proof of the same boundary; I did not execute tests in this read-only review.

Is this the best way to solve the issue?

Yes as a bounded first fix. The PR avoids automatic replay after possible side effects, surfaces a specific timeout outcome, and leaves broader automatic resume policy to maintainer judgment.

Label changes:

  • add P1: The PR targets an urgent Codex app-server stall affecting user-facing agent/channel workflows where turns can abort after productive work.
  • add merge-risk: 🚨 session-state: The diff deliberately changes replay safety and liveness metadata for timed-out Codex turns, which affects how session recovery treats prior tool activity.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix AWS Crabbox deterministic gateway proof and live Codex provider sanity output with concrete run IDs and observed results.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong proof and focused tests support the patch, while the changed session-state semantics still need maintainer acceptance before merge.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix AWS Crabbox deterministic gateway proof and live Codex provider sanity output with concrete run IDs and observed results.

Label justifications:

  • P1: The PR targets an urgent Codex app-server stall affecting user-facing agent/channel workflows where turns can abort after productive work.
  • merge-risk: 🚨 session-state: The diff deliberately changes replay safety and liveness metadata for timed-out Codex turns, which affects how session recovery treats prior tool activity.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong proof and focused tests support the patch, while the changed session-state semantics still need maintainer acceptance before merge.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix AWS Crabbox deterministic gateway proof and live Codex provider sanity output with concrete run IDs and observed results.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix AWS Crabbox deterministic gateway proof and live Codex provider sanity output with concrete run IDs and observed results.

What I checked:

  • Protected PR state: The provided GitHub context shows this PR is open, mergeable, contributor-authored, and carries the protected maintainer label, so this workflow must keep it open for explicit maintainer handling.
  • Current main timeout behavior: Current main aborts the Codex app-server run on completion-idle timeout after waiting for turn/completed, which matches the linked issue's reported failure boundary. (extensions/codex/src/app-server/run-attempt.ts:1553, 0ab1449215f5)
  • PR timeout outcome: The PR head computes a promptTimeoutOutcome for completion-idle timeouts and marks side-effecting cases as replayInvalid: true with livenessState: "abandoned". (extensions/codex/src/app-server/run-attempt.ts:2687, ecb61d596278)
  • Side-effect tracking: The PR head broadens replay metadata from messaging-tool-only evidence to include cron adds, native mutating/MCP tool items, and executed dynamic tools while preserving blocked and pre-execution failures as retry-safe. (extensions/codex/src/app-server/event-projector.ts:328, ecb61d596278)
  • Focused regression coverage: The PR adds tests for completed native command timeouts, executed dynamic-tool timeouts, active mutating items, assistant-only timeout messaging, and runner propagation of harness-provided timeout metadata. (extensions/codex/src/app-server/run-attempt.test.ts:2855, ecb61d596278)
  • Real behavior proof in PR body: The PR body reports AWS Crabbox deterministic E2E proof where the first turn timed out with replayInvalid=true and livenessState=abandoned, the second turn returned SECOND_RUN_OK, and a live Codex provider Docker harness passed. (ecb61d596278)

Likely related people:

  • funmerlin: Authored the merged quiescent app-server turn fix that added the existing fail-fast behavior for last-item completion without turn/completed. (role: adjacent fix author; confidence: high; commits: 127156a88a29; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
  • joshavant: Authored the earlier merged Codex app-server liveness fix for account/rate-limit notifications and duplicate timeout suppression, and this PR builds on that same surface. (role: prior area contributor and current PR author; confidence: high; commits: 5fdef4c39e75, 437f322aafed, ecb61d596278; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run.ts)
  • steipete: Recent GitHub file history shows repeated Codex app-server and embedded-runner changes, including current-day edits on run-attempt.ts and broader runtime/refactor work near this path. (role: recent area contributor; confidence: medium; commits: 1d5b5db4d221, 02182d5a3031, 4ff28a77356f; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0ab1449215f5.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Tiny Clawlet

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: guards the happy path.
Image traits: location branch lighthouse; accessory shell-shaped keyboard; palette amber, ink, and glacier blue; mood focused; pose holding its accessory up for inspection; shell soft velvet shell; lighting gentle morning glow; background little resolved-comment flags.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Tiny Clawlet in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@joshavant joshavant merged commit 7cda26a into main May 21, 2026
195 of 202 checks passed
@joshavant joshavant deleted the fix/codex-missing-turn-completed branch May 21, 2026 22:02
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(codex): handle missing turn completion

* docs: add changelog for Codex completion fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Codex app-server stalls after item/completed, then aborts without recovery/status

1 participant