fix(codex): arm completion idle watch for stalled binary after rawResponseItem/completed#87079
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 3:19 AM ET / 07:19 UTC. Summary PR surface: Source +26, Tests +159. Total +185 across 2 files. Reproducibility: yes. by source inspection: current main touches progress for Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the focused watchdog fix after exact-head checks pass and maintainers accept the narrowed availability boundary, then close the linked raw-completion stall issue as fixed by the merged PR. Do we have a high-confidence way to reproduce the issue? Yes, by source inspection: current main touches progress for Is this the best way to solve the issue? Yes, the proposed solution is the narrowest maintainable fix I found: it only preserves the short watchdog for current-turn raw completions with no active items or requests, while keeping raw assistant release and AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against f327df866cb0. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +26, Tests +159. Total +185 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Pearl Review Wisp Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
Additional evidence: code path analysisThe stall logs above come from the gateway's built-in stall detector (not a custom script). Here's the exact code path that leads to the 30-minute hang: Notification sequence during the stall (from gateway stderr tmux pane):
The fix adds Why there's no "after" reproduction yet: The stall is intermittent — it depends on the OpenAI Responses API failing to deliver The "before" evidence is the gateway's own stall detector logs showing 377s+ of silence after |
Real behavior proof: targeted regression testAdded
Without the fix: The With the fix:
Full vitest run: all 230 existing tests still pass — no regressions. |
Uptime and stability reportGateway process (pid 30008) has been running continuously since However, no codex turns have executed on this gateway instance — the last codex activity was at Evidence summary for this PR:
The regression test is the definitive proof — it exercises the exact notification sequence ( |
Live evidence: completion idle watch catching a binary stall (2026-05-27)With this fix deployed on a live OpenClaw gateway (v2026.5.26-beta.1 + cherry-picked fix), the codex binary stalled during a real user turn: Timeline:
Verbose log (timeout event): stderr diagnostic confirming the stall: Network evidence (Clash Verge service log): zero outbound connections to OpenAI/Cloudflare between 12:26:45 and 12:36:23. The binary never even attempted the API call — it froze internally after computing token usage. User-facing result: Marvin received the timeout message ("Codex stopped before confirming the turn was complete") after ~60s and could retry. Without this fix, the session would have hung for 30 minutes on the terminal idle timeout before the user got any feedback. |
e0c4587 to
7a210a0
Compare
|
Maintainer refresh on the cleaned branch. Behavior addressed: Codex app-server stalls where a raw response item completes but the turn never reaches Still needed after checking upstream Codex: yes, but only in this narrow form. Upstream still defines and emits Real environment tested: local OpenClaw checkout plus Blacksmith Testbox through Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix: the new non-assistant raw completion case arms the completion idle watchdog, raw assistant final completion still releases the session, assistant commentary does not release or arm the short completion watchdog, and the current queued-terminal watchdog behavior remains intact. What was not tested: no live Codex provider stall was replayed after the patch; this is source-contract, focused regression-test, autoreview, and Testbox proof. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
7a210a0 to
3cd43c6
Compare
7a210a0 to
3cd43c6
Compare
3cd43c6 to
c374edf
Compare
… with no active items When the codex binary emits rawResponseItem/completed and all tracked items have completed (activeTurnItemIds empty, no active requests), the binary should deliver turn/completed imminently. Previously, a rawResponseItem/completed that didn't qualify as a post-tool assistant completion would actively disarm the completion idle watch, leaving only the 30-minute terminal timeout to catch a stalled binary. This caused turns to hang for up to 30 minutes when the OpenAI Responses API fails to deliver response.completed to the binary. Now, rawResponseItem/completed with no active items arms the 60s completion idle watch and is excluded from the disarm path, so stalled binaries are detected in 60s instead of 30 minutes. Refs openclaw#87071 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…wResponseItem/completed Regression test for the binary stall fix: when rawResponseItem/completed arrives with a non-assistant type (e.g. "reasoning") and all tracked items have completed, the completion idle watch must stay armed so the stall is caught in 60s, not 30 minutes. Refs openclaw#87071 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c374edf to
bd02f8e
Compare
|
Maintainer landing proof for exact head Behavior addressed: Codex app-server stalls after Real environment tested: local OpenClaw checkout on current Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix: the new raw non-assistant completion path arms the completion idle watchdog, source-reply raw reasoning stays on the longer guard, and raw assistant completion remains releasable. What was not tested: no new live provider stall replay during landing; prior maintainer refresh already checked upstream Thanks @Marvinthebored. |
|
Landed on Proof recorded above for exact PR head Thanks @Marvinthebored. |
|
Landed on main.
Thanks @Marvinthebored. |
Summary
rawResponseItem/completedarrives with all tracked items completed and no active app-server requests.message_tool_onlysource-reply guard for raw reasoning completions so valid turns still have time to send the visible message tool call.Root cause
When the Codex binary relays
rawResponseItem/completedand all tracked items are already complete, the binary should deliverturn/completedshortly after. If the binary goes silent beforeturn/completed, currentmaincan disarm the completion idle watch because the notification is not a raw assistant completion, not raw tool output, and not the finalitem/completedpath. That leaves only the long terminal idle timeout for recovery.Changes
extensions/codex/src/app-server/run-attempt.tsnow treats current-turnrawResponseItem/completedwith no active items and no active app-server requests as a completion-watch state. It arms the completion idle watch and excludes that notification from the generic disarm block.For
message_tool_onlyturns, raw reasoning completions are routed through the existing post-reasoning source-reply timeout so they do not prematurely hit the short completion idle timeout before Codex can call the visible message tool.extensions/codex/src/app-server/run-attempt.test.tsadds regressions for atype: "reasoning"raw completion that should time out promptly in normal mode, and Codex's real reasoning item lifecycle followed by its raw mirror in message-tool-only mode where the turn should keep waiting untilturn/completedarrives.Real behavior proof
Behavior addressed: A real Codex app-server turn can stall after final progress or
rawResponseItem/completedwithout deliveringturn/completed; this patch keeps the 60s completion idle watchdog armed so the user gets timeout/retry feedback instead of a long hang.Real environment tested: Live OpenClaw gateway running a cherry-picked build containing this fix on 2026-05-27, plus local macOS source checkout with Node/Vitest regression proof.
Exact steps or command run after this patch: Live gateway ran a real user Codex turn on the patched build; locally,
node scripts/run-vitest.mjs --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose --testTimeout=5000 -t "(arms completion idle watch after non-assistant rawResponseItem/completed with no active items|keeps waiting after reasoning and its raw mirror complete before a visible message call|keeps waiting after reasoning completes before a visible message call|releases the session after a raw assistant response item without turn completion)".Evidence after fix: Live log reported
timeoutMs: 60000, idleMs: 60000withlastActivityReason: "notification:item/completed"; stderr diagnostics showed the session had stalled instate=processingwithactiveWorkKind=model_calland no recovery before the watchdog fired. The local regressions assert the completion idle timeout fires withtimeoutMs: 5andlastActivityReason: "notification:rawResponseItem/completed", while the terminal idle timeout does not fire. The companion regressions assert Codex's reasoning item lifecycle plus raw mirror inmessage_tool_onlymode does not interrupt after the short completion timeout, and a raw assistant completion still releases through the assistant-output path even when the completion timeout is shorter.Observed result after fix: The patched live gateway returned timeout feedback after roughly 60s and allowed retry instead of hanging until the long terminal timeout. The regressions catch the specific non-assistant raw completion path from this PR and the source-reply guard edge found during maintainer review.
What was not tested: A deterministic live reproduction of the exact intermittent
type: "reasoning"rawResponseItem/completedstall; the live stall proof covered the same completion-watch recovery behavior, and the targeted regressions cover the exact raw notification shapes plus the raw assistant release path that must remain unchanged.Verification
git diff --check origin/main...HEADnode scripts/run-vitest.mjs --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose --testTimeout=5000 -t "(arms completion idle watch after non-assistant rawResponseItem/completed with no active items|keeps waiting after reasoning and its raw mirror complete before a visible message call|keeps waiting after reasoning completes before a visible message call|releases the session after a raw assistant response item without turn completion)".agents/skills/autoreview/scripts/autoreview --mode branch --base origin/mainRefs #87071, #86948, #87022, #87070.