Skip to content

fix(codex): arm completion idle watch for stalled binary after rawResponseItem/completed#87079

Merged
steipete merged 5 commits into
openclaw:mainfrom
Marvinthebored:fix/codex-binary-stall-idle-watch
May 27, 2026
Merged

fix(codex): arm completion idle watch for stalled binary after rawResponseItem/completed#87079
steipete merged 5 commits into
openclaw:mainfrom
Marvinthebored:fix/codex-binary-stall-idle-watch

Conversation

@Marvinthebored

@Marvinthebored Marvinthebored commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Arm the 60s completion idle watch when rawResponseItem/completed arrives with all tracked items completed and no active app-server requests.
  • Keep that raw completion state out of the catch-all completion-watch disarm path so a stalled Codex binary is detected in 60s instead of waiting for the terminal idle timeout.
  • Preserve the longer message_tool_only source-reply guard for raw reasoning completions so valid turns still have time to send the visible message tool call.
  • Add regression coverage for the stalled non-assistant raw completion path, Codex's reasoning item + raw mirror source-reply path, and raw assistant release path.

Root cause

When the Codex binary relays rawResponseItem/completed and all tracked items are already complete, the binary should deliver turn/completed shortly after. If the binary goes silent before turn/completed, current main can disarm the completion idle watch because the notification is not a raw assistant completion, not raw tool output, and not the final item/completed path. That leaves only the long terminal idle timeout for recovery.

Changes

extensions/codex/src/app-server/run-attempt.ts now treats current-turn rawResponseItem/completed with no active items and no active app-server requests as a completion-watch state. It arms the completion idle watch and excludes that notification from the generic disarm block.

For message_tool_only turns, raw reasoning completions are routed through the existing post-reasoning source-reply timeout so they do not prematurely hit the short completion idle timeout before Codex can call the visible message tool.

extensions/codex/src/app-server/run-attempt.test.ts adds regressions for a type: "reasoning" raw completion that should time out promptly in normal mode, and Codex's real reasoning item lifecycle followed by its raw mirror in message-tool-only mode where the turn should keep waiting until turn/completed arrives.

Real behavior proof

Behavior addressed: A real Codex app-server turn can stall after final progress or rawResponseItem/completed without delivering turn/completed; this patch keeps the 60s completion idle watchdog armed so the user gets timeout/retry feedback instead of a long hang.

Real environment tested: Live OpenClaw gateway running a cherry-picked build containing this fix on 2026-05-27, plus local macOS source checkout with Node/Vitest regression proof.

Exact steps or command run after this patch: Live gateway ran a real user Codex turn on the patched build; locally, node scripts/run-vitest.mjs --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose --testTimeout=5000 -t "(arms completion idle watch after non-assistant rawResponseItem/completed with no active items|keeps waiting after reasoning and its raw mirror complete before a visible message call|keeps waiting after reasoning completes before a visible message call|releases the session after a raw assistant response item without turn completion)".

Evidence after fix: Live log reported timeoutMs: 60000, idleMs: 60000 with lastActivityReason: "notification:item/completed"; stderr diagnostics showed the session had stalled in state=processing with activeWorkKind=model_call and no recovery before the watchdog fired. The local regressions assert the completion idle timeout fires with timeoutMs: 5 and lastActivityReason: "notification:rawResponseItem/completed", while the terminal idle timeout does not fire. The companion regressions assert Codex's reasoning item lifecycle plus raw mirror in message_tool_only mode does not interrupt after the short completion timeout, and a raw assistant completion still releases through the assistant-output path even when the completion timeout is shorter.

Observed result after fix: The patched live gateway returned timeout feedback after roughly 60s and allowed retry instead of hanging until the long terminal timeout. The regressions catch the specific non-assistant raw completion path from this PR and the source-reply guard edge found during maintainer review.

What was not tested: A deterministic live reproduction of the exact intermittent type: "reasoning" rawResponseItem/completed stall; the live stall proof covered the same completion-watch recovery behavior, and the targeted regressions cover the exact raw notification shapes plus the raw assistant release path that must remain unchanged.

Verification

  • git diff --check origin/main...HEAD
  • node scripts/run-vitest.mjs --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose --testTimeout=5000 -t "(arms completion idle watch after non-assistant rawResponseItem/completed with no active items|keeps waiting after reasoning and its raw mirror complete before a visible message call|keeps waiting after reasoning completes before a visible message call|releases the session after a raw assistant response item without turn completion)"
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Refs #87071, #86948, #87022, #87070.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 3:19 AM ET / 07:19 UTC.

Summary
The PR changes the Codex app-server run attempt logic to keep the completion idle watchdog armed after non-assistant rawResponseItem/completed notifications with no active items or app-server requests, and adds regression coverage for the stall, source-reply, and raw-assistant release paths.

PR surface: Source +26, Tests +159. Total +185 across 2 files.

Reproducibility: yes. by source inspection: current main touches progress for rawResponseItem/completed but then lets the same notification fall into the catch-all completion-watch disarm path unless it is a raw tool output, raw assistant guard, or final item completion. I did not run the new regression test because this review is read-only.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • This changes when OpenClaw aborts a quiet Codex turn; a too-broad raw-completion classification could create premature user-visible timeouts, although this branch narrows the condition to current-turn, no-active-items, no-active-requests and protects source-reply/raw-assistant paths with tests.
  • The current head was force-pushed after an earlier review cycle, so maintainers should make sure required checks are green on c374edf31192bd716a0e8a0dcd34d796e0d34734 before landing.

Maintainer options:

  1. Accept the narrowed watchdog boundary (recommended)
    Land once exact-head required checks are green, treating the no-active-items/no-active-requests guard and regression tests as sufficient protection against premature Codex turn aborts.
  2. Require another exact-head runtime proof
    Pause landing until a maintainer reruns the focused Codex app-server test and changed-check lane on c374edf31192bd716a0e8a0dcd34d796e0d34734.
  3. Defer to a wider timeout design
    Close or pause this PR only if maintainers want to redesign Codex completion and terminal idle guards together instead of landing this narrow recovery path.

Next step before merge
There is no narrow automated code repair to request; maintainers should review the latest head, ensure exact-head checks are green, and decide whether to land the availability-sensitive watchdog change.

Security
Cleared: The diff only changes Codex app-server timeout logic and tests; it does not alter dependencies, CI, secrets handling, permissions, package scripts, or code-download surfaces.

Review details

Best possible solution:

Land the focused watchdog fix after exact-head checks pass and maintainers accept the narrowed availability boundary, then close the linked raw-completion stall issue as fixed by the merged PR.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection: current main touches progress for rawResponseItem/completed but then lets the same notification fall into the catch-all completion-watch disarm path unless it is a raw tool output, raw assistant guard, or final item completion. I did not run the new regression test because this review is read-only.

Is this the best way to solve the issue?

Yes, the proposed solution is the narrowest maintainable fix I found: it only preserves the short watchdog for current-turn raw completions with no active items or requests, while keeping raw assistant release and message_tool_only source-reply behavior separate.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f327df866cb0.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments include after-fix live gateway logs showing the 60s watchdog returning timeout feedback, plus focused regression and Testbox evidence; contributors should continue redacting private logs when adding more proof.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The PR addresses Codex app-server hangs that leave agent turns stuck until a long terminal timeout, affecting an active runtime workflow.
  • merge-risk: 🚨 availability: Changing timeout arming/disarming can alter whether valid Codex turns continue waiting or get aborted, so green unit tests alone do not settle the runtime availability risk.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body and comments include after-fix live gateway logs showing the 60s watchdog returning timeout feedback, plus focused regression and Testbox evidence; contributors should continue redacting private logs when adding more proof.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments include after-fix live gateway logs showing the 60s watchdog returning timeout feedback, plus focused regression and Testbox evidence; contributors should continue redacting private logs when adding more proof.
Evidence reviewed

PR surface:

Source +26, Tests +159. Total +185 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 27 1 +26
Tests 1 161 2 +159
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 188 3 +185

What I checked:

  • Current main disarms the short completion watchdog for the reported raw completion shape: On current main, handleNotification only preserves the short completion watchdog for raw tool output, post-tool raw assistant completion, source-reply reasoning item completion, or final item/completed; a current-turn non-assistant rawResponseItem/completed falls through to the catch-all disarm block. (extensions/codex/src/app-server/run-attempt.ts:2496, f327df866cb0)
  • PR head adds the narrowed raw completion watchdog state: The PR head adds rawResponseItemCompletedWithNoActiveItems, requiring the active turn, rawResponseItem/completed, zero active items, zero active app-server requests, and exclusion from assistant-release guard paths before arming the completion idle watch and excluding that notification from the catch-all disarm. (extensions/codex/src/app-server/run-attempt.ts:2452, c374edf31192)
  • PR head preserves active request and source-reply guard behavior: The surrounding request handler already suppresses completion timers while app-server turn requests are active, and the PR adds a raw reasoning mirror path to keep message_tool_only source replies on the longer post-reasoning guard. (extensions/codex/src/app-server/run-attempt.ts:2631, c374edf31192)
  • Regression tests cover the repaired and protected paths: The PR adds a targeted test asserting a non-assistant raw completion fires the short completion idle timeout before the terminal timeout, plus a raw reasoning mirror test proving message_tool_only waits for turn/completed rather than aborting early. (extensions/codex/src/app-server/run-attempt.test.ts:5474, c374edf31192)
  • Upstream Codex contract supports the event ordering risk: Upstream app-server protocol exposes rawResponseItem/completed separately from turn/completed, and upstream event handling emits raw response item completions before the final turn completion notification; upstream core also treats stream closure before response.completed as a distinct error path.
  • Repository policy and scoped extension policy were applied: The root AGENTS.md requires whole-surface review, dependency contract proof, real behavior proof, and merge-risk treatment for availability-sensitive PRs; extensions/AGENTS.md confirms this is plugin-owned code and the change stays within the extension boundary. (AGENTS.md:1, f327df866cb0)

Likely related people:

  • steipete: Peter Steinberger is the dominant recent committer on extensions/codex/src/app-server/run-attempt.ts, authored several Codex app-server lifecycle commits, committed the current PR head, and provided the maintainer refresh comment for this PR. (role: recent area contributor and reviewer; confidence: high; commits: 659bcc5e5b59, 545490c5920d, 9ac7a0398213; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
  • vincentkoc: Vincent Koc has recent merged history in the Codex app-server/auth startup path, including fix(codex): reuse bound auth profile for app-server startup, and is assigned on the PR timeline. (role: adjacent Codex runtime contributor; confidence: medium; commits: f1cc8f0cfc7c, 859eb0666282; files: extensions/codex/src/app-server/run-attempt.ts)
  • Andy Ye: Current checkout blame attributes the existing timeout/watchdog block to commit 41fa603aa8; the shallow/grafted history makes this a weaker ownership signal than the targeted Codex app-server commits. (role: current-line provenance signal; confidence: low; commits: 41fa603aa857; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Pearl Review Wisp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: purrs at green checks.
Image traits: location artifact grotto; accessory lint brush; palette charcoal, cyan, and signal green; mood proud; pose peeking out from the egg shell; shell matte ceramic shell; lighting bright celebratory glints; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Pearl Review Wisp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@Marvinthebored

Copy link
Copy Markdown
Contributor Author

Additional evidence: code path analysis

The stall logs above come from the gateway's built-in stall detector (not a custom script). Here's the exact code path that leads to the 30-minute hang:

Notification sequence during the stall (from gateway stderr tmux pane):

  1. Binary emits rawResponseItem/completed — gateway calls touchTurnCompletionActivity(), resetting timestamps
  2. handleNotification evaluates arm/disarm conditions:
    • postToolRawAssistantCompletionNeedsTerminalGuardfalse (requires isRawAssistantCompletionNotification which needs type=message, role=assistant, text preview — the final rawResponseItem/completed didn't qualify)
    • shouldRearmCompletionIdleWatchAfterLastCurrentTurnItemfalse (requires notification.method === "item/completed", but this is rawResponseItem/completed)
    • None of the arm conditions match → falls through to the catch-all disarm block (line 2508)
  3. Disarm check passes: turnCompletionIdleWatchArmed && !rawToolOutputCompletion && !postToolRawAssistantCompletionNeedsTerminalGuard && !shouldRearmCompletionIdleWatchAfterLastCurrentTurnItem → all true → disarmTurnCompletionIdleWatch() fires
  4. Only turnTerminalIdleWatch (30 min) remains active
  5. Binary goes silent — turn/completed never arrives
  6. Gateway stall detector reports every 30s but has no recovery action

The fix adds rawResponseItemCompletedWithNoActiveItems to both the arm path (line 2496) and the disarm exclusion (line 2516), so the 60s completion idle watch stays armed after the binary's last notification.

Why there's no "after" reproduction yet: The stall is intermittent — it depends on the OpenAI Responses API failing to deliver response.completed to the binary. It happened once during ~45 minutes of testing. With the fix installed, the next occurrence will produce a 60s completion idle timeout instead of a 30-minute hang — I'll update this PR when that happens.

The "before" evidence is the gateway's own stall detector logs showing 377s+ of silence after rawResponseItem/completed with recovery=none escalating to recovery=checking.

@Marvinthebored

Copy link
Copy Markdown
Contributor Author

Real behavior proof: targeted regression test

Added test(codex): verify completion idle watch arms after non-assistant rawResponseItem/completed — this exercises the exact stall scenario:

  1. Turn starts with a tool call (turnCrossedToolHandoff = true)
  2. Tool call completes (activeTurnItemIds → empty)
  3. rawResponseItem/completed arrives with type: "reasoning" — does NOT qualify as postToolRawAssistantCompletionNeedsTerminalGuard (which requires type: "message" + role: "assistant" + text preview)
  4. Binary goes silent — no turn/completed arrives

Without the fix: The rawResponseItem/completed hits the catch-all disarm block and disarms the completion idle watch. Only the 30-minute terminal timeout catches the stall. The test would fail because fireTurnCompletionIdleTimeout never fires.

With the fix: rawResponseItemCompletedWithNoActiveItems keeps the completion idle watch armed (5ms in test). The test asserts:

  • result.timedOut === true
  • result.promptError === "codex app-server turn idle timed out waiting for turn/completed"
  • The completion idle timeout fired ("codex app-server turn idle timed out waiting for completion" logged with timeoutMs: 5)
  • The terminal idle timeout did NOT fire (confirming the shorter watch caught it first)
 ✓ arms completion idle watch after non-assistant rawResponseItem/completed with no active items (415ms)

 Test Files  1 passed (1)
      Tests  230 passed (230)

Full vitest run: all 230 existing tests still pass — no regressions.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 27, 2026
@Marvinthebored

Copy link
Copy Markdown
Contributor Author

Uptime and stability report

Gateway process (pid 30008) has been running continuously since Wed May 27 08:16:57 20262h 10m+ uptime on the patched build (main + #87070 + this fix).

$ ps -o pid,lstart,etime -p 30008
  PID STARTED                       ELAPSED
30008 Wed May 27 08:16:57 2026     02:10:10

However, no codex turns have executed on this gateway instance — the last codex activity was at 03:30 before the restart, so there's been no opportunity for the binary stall to recur. The absence of crashes is real but doesn't constitute a codex-specific live proof.

Evidence summary for this PR:

Type Status Details
Before (stall observed) Gateway stall detector: rawResponseItem/completed → 377s+ silence, recovery=none
Before (false timeouts) 4 idle timeouts on pre-fix builds (14:10, 16:09, 22:46, 23:07) — completion idle watch was being disarmed
Code path analysis Traced the exact disarm block that kills the watch on non-assistant rawResponseItem/completed
Targeted regression test arms completion idle watch after non-assistant rawResponseItem/completed with no active items — 230/230 pass, proves completion idle timeout fires (5ms) before terminal timeout (500ms)
After (live reproduction) Gateway stable 2h+, but no codex turns have run on this instance yet

The regression test is the definitive proof — it exercises the exact notification sequence (rawResponseItem/completed type reasoning, all items done, no active requests) and asserts the completion idle watch stays armed. Will update if a live codex stall is caught by the fix.

@Marvinthebored

Copy link
Copy Markdown
Contributor Author

Live evidence: completion idle watch catching a binary stall (2026-05-27)

With this fix deployed on a live OpenClaw gateway (v2026.5.26-beta.1 + cherry-picked fix), the codex binary stalled during a real user turn:

Timeline:

  • 12:26:45 — turn started, context-engine thread binding written
  • 12:26:46 — binary sent thread/tokenUsage/updated (token count), then went completely silent
  • 12:30:31 — stall detector flagged the session (140s old, 139s since last progress)
  • 12:37:50completion idle watch fired at exactly 60s after the last item/completed notification

Verbose log (timeout event):

timeoutMs: 60000, idleMs: 60000
lastActivityReason: "notification:item/completed"

stderr diagnostic confirming the stall:

[diagnostic] stalled session: sessionId=fc1d513c state=processing age=140s
  activeWorkKind=model_call
  lastProgress=codex_app_server:notification:thread/tokenUsage/updated
  lastProgressAge=139s recovery=none

Network evidence (Clash Verge service log): zero outbound connections to OpenAI/Cloudflare between 12:26:45 and 12:36:23. The binary never even attempted the API call — it froze internally after computing token usage.

User-facing result: Marvin received the timeout message ("Codex stopped before confirming the turn was complete") after ~60s and could retry. Without this fix, the session would have hung for 30 minutes on the terminal idle timeout before the user got any feedback.

@steipete

Copy link
Copy Markdown
Contributor

Maintainer refresh on the cleaned branch.

Behavior addressed: Codex app-server stalls where a raw response item completes but the turn never reaches turn/completed; OpenClaw now arms the completion idle watchdog only for non-assistant rawResponseItem/completed notifications with no active items/requests. Raw assistant final messages and commentary remain on the existing assistant-release/commentary paths.

Still needed after checking upstream Codex: yes, but only in this narrow form. Upstream still defines and emits rawResponseItem/completed before turn/completed (codex-rs/app-server-protocol/src/protocol/common.rs, codex-rs/app-server-protocol/src/protocol/v2/item.rs, codex-rs/app-server/src/bespoke_event_handling.rs). Core streaming can also produce output-item-done events before response completion, and upstream has retry coverage for streams closing before response.completed (codex-rs/core/src/session/turn.rs, codex-rs/core/tests/suite/stream_no_completed.rs). So OpenClaw still needs a guard for this incomplete-turn shape; the stale broader queued-terminal changes from the old branch were dropped.

Real environment tested: local OpenClaw checkout plus Blacksmith Testbox through pnpm check:changed; upstream Codex source reviewed from the local upstream checkout.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose
  • pnpm check:changed
  • .agents/skills/autoreview/scripts/autoreview --mode local --prompt 'Review final focused fix for OpenClaw PR #87079. Context: upstream ../codex still emits rawResponseItem/completed for ResponseItem before turn/completed. OpenClaw patch keeps completion idle watch armed only for rawResponseItem/completed with no active items/requests that is not raw tool output, not a raw assistant message/release, and not existing source-reply/item-completed cases. Verify raw assistant release, assistant commentary, and queued-terminal watchdog behavior remain intact.'

Evidence after fix:

Observed result after fix: the new non-assistant raw completion case arms the completion idle watchdog, raw assistant final completion still releases the session, assistant commentary does not release or arm the short completion watchdog, and the current queued-terminal watchdog behavior remains intact.

What was not tested: no live Codex provider stall was replayed after the patch; this is source-contract, focused regression-test, autoreview, and Testbox proof.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@steipete steipete force-pushed the fix/codex-binary-stall-idle-watch branch from 7a210a0 to 3cd43c6 Compare May 27, 2026 06:40
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 27, 2026
@steipete steipete force-pushed the fix/codex-binary-stall-idle-watch branch 2 times, most recently from 7a210a0 to 3cd43c6 Compare May 27, 2026 06:50
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 27, 2026
@clawsweeper clawsweeper Bot added status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@steipete steipete force-pushed the fix/codex-binary-stall-idle-watch branch from 3cd43c6 to c374edf Compare May 27, 2026 07:11
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 27, 2026
Username and others added 5 commits May 27, 2026 08:21
… with no active items

When the codex binary emits rawResponseItem/completed and all tracked
items have completed (activeTurnItemIds empty, no active requests), the
binary should deliver turn/completed imminently. Previously, a
rawResponseItem/completed that didn't qualify as a post-tool assistant
completion would actively disarm the completion idle watch, leaving only
the 30-minute terminal timeout to catch a stalled binary. This caused
turns to hang for up to 30 minutes when the OpenAI Responses API fails
to deliver response.completed to the binary.

Now, rawResponseItem/completed with no active items arms the 60s
completion idle watch and is excluded from the disarm path, so stalled
binaries are detected in 60s instead of 30 minutes.

Refs openclaw#87071

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…wResponseItem/completed

Regression test for the binary stall fix: when rawResponseItem/completed
arrives with a non-assistant type (e.g. "reasoning") and all tracked
items have completed, the completion idle watch must stay armed so the
stall is caught in 60s, not 30 minutes.

Refs openclaw#87071

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@steipete steipete force-pushed the fix/codex-binary-stall-idle-watch branch from c374edf to bd02f8e Compare May 27, 2026 07:23
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer landing proof for exact head bd02f8ebb099f4747dfdd6c5f5815d88c4519121.

Behavior addressed: Codex app-server stalls after rawResponseItem/completed without turn/completed; the PR keeps the completion idle watchdog armed only for no-active-items/no-active-requests raw completion states while preserving raw assistant release and source-reply guard behavior.

Real environment tested: local OpenClaw checkout on current main plus Blacksmith Testbox changed-check gate.

Exact steps or command run after this patch:

  • git diff --check origin/main...HEAD
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose
  • pnpm changed:lanes --json
  • pnpm check:changed
  • gh pr checks 87079 --watch --fail-fast=false

Evidence after fix:

Observed result after fix: the new raw non-assistant completion path arms the completion idle watchdog, source-reply raw reasoning stays on the longer guard, and raw assistant completion remains releasable.

What was not tested: no new live provider stall replay during landing; prior maintainer refresh already checked upstream ../codex source and live proof remains in the PR body/comments.

Thanks @Marvinthebored.

@steipete steipete merged commit e718d47 into openclaw:main May 27, 2026
94 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed on main as e718d471f287c0929f52644ae9816fe395e9c26a.

Proof recorded above for exact PR head bd02f8ebb099f4747dfdd6c5f5815d88c4519121: focused Vitest passed (234 tests), Testbox changed check passed (tbx_01ksm54cdprq435mfv4r3rep8k), and exact-head PR CI/CodeQL were green before merge.

Thanks @Marvinthebored.

@steipete

Copy link
Copy Markdown
Contributor

Landed on main.

  • Head SHA: bd02f8e
  • Merge commit: e718d47
  • Local proof: git diff --check origin/main...HEAD before merge
  • Local proof: node scripts/run-vitest.mjs --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose --testTimeout=5000 -t "(arms completion idle watch after non-assistant rawResponseItem/completed with no active items|keeps waiting after reasoning and its raw mirror complete before a visible message call|keeps waiting after reasoning completes before a visible message call|releases the session after a raw assistant response item without turn completion)" passed with 4 tests
  • Review proof: .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main clean
  • GitHub proof: fresh normal CI jobs on bd02f8e were green before merge; unrelated Critical Quality (network-runtime-boundary) was still pending without logs and outside the touched Codex extension surface

Thanks @Marvinthebored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants