Skip to content

fix(codex): prevent false idle timeouts under notification queue delay#87070

Closed
Marvinthebored wants to merge 1 commit into
openclaw:mainfrom
Marvinthebored:fix/codex-turn-completion-idle-race
Closed

fix(codex): prevent false idle timeouts under notification queue delay#87070
Marvinthebored wants to merge 1 commit into
openclaw:mainfrom
Marvinthebored:fix/codex-turn-completion-idle-race

Conversation

@Marvinthebored

Copy link
Copy Markdown
Contributor

Summary

  • Check terminalTurnNotificationQueued in all three idle timeout handlers so they don't fire when turn/completed is already queued but not yet processed
  • Touch turnCompletionLastActivityAt / turnAttemptLastProgressAt at enqueue time in enqueueNotification, not just at process time

Companion to #87022 — that PR bounds the fallout (client retirement, failover policy) when a timeout fires; this PR prevents the false timeout from firing in the first place.

Root cause

waitForCodexNotificationDispatchTurn() yields via setImmediate (introduced in ea16a5e9e1 / #82333). Under event-loop saturation, turn/completed sits in the notification queue past the 60s idle threshold. The idle timer checks process-time timestamps, but the notification hasn't been processed yet — only enqueued. The terminalTurnNotificationQueued flag (added in 4cbf616d30) already marks this at enqueue time, but the idle timeout handlers don't check it.

Changes

extensions/codex/src/app-server/run-attempt.ts:

  1. fireTurnAttemptIdleTimeout: add terminalTurnNotificationQueued to guard condition
  2. fireTurnCompletionIdleTimeout: add terminalTurnNotificationQueued to guard condition
  3. fireTurnTerminalIdleTimeout: add terminalTurnNotificationQueued to guard condition
  4. enqueueNotification: touch turnCompletionLastActivityAt and turnAttemptLastProgressAt at receive time for notifications matching the active turn, so the idle timer sees live binary I/O even when the queue is backed up

Real behavior proof

Before (on 5a7d5c6def, includes #87022): Two false idle timeouts in 25 minutes, TUI channel, openai/gpt-5.5:

22:46:34 [WARN] codex app-server turn idle timed out waiting for completion
22:46:34 [WARN] codex app-server client retired after timed-out turn
23:07:37 [WARN] codex app-server turn idle timed out waiting for completion
23:07:37 [WARN] codex app-server client retired after timed-out turn

Monitor CSV (pre-fix):

timestamp  cpu_pct  total_fds  tcp_fds  tcp_established  event_loop_warn
23:59:12   102.7    81         5        3                0
23:59:19   103.7    81         5        3                0
23:59:25   105.8    81         5        3                0
23:59:58   145.9    90         6        3                0
00:00:37   187.9    89         6        3                0

After (fix applied, gateway restarted at 23:56): Turn ran 4m 38s+ past the 60s threshold, zero false timeouts:

timestamp  cpu_pct  total_fds  tcp_fds  tcp_established  event_loop_warn
00:01:17   0.2      89         6        3                0
00:02:10   0.1      89         6        3                0
00:03:09   0.1      89         6        3                0
00:04:08   0.1      89         6        3                0
00:05:07   0.2      89         6        3                0
00:06:06   0.3      89         6        3                0

Zero timeouts, CPU 0.1–1.4%, fds stable at 89–90, TCP stable at 6–7.

Separate issue noted

During post-fix testing, a genuine binary stall was observed — the codex binary stops sending notifications after rawResponseItem/completed and turn/completed never arrives (377s+). This fix correctly does NOT suppress that timeout, since terminalTurnNotificationQueued is never set when turn/completed genuinely never arrives. Filed separately.

Refs #86948, #87022.
cc @steipete @Peetiegonzalez

openclaw#86948)

The completion-idle timer (60 s) fires based on when notifications are
*processed* inside the chained notification queue, not when they are
*received* from the app-server binary over stdio.  Since ea16a5e
added a `setImmediate` yield between each queued dispatch, every
notification is delayed by at least one event-loop tick.  Under heavy
load (tool-call-intensive turns) the ticks accumulate, and the
`turn/completed` notification can arrive from the binary but sit in the
queue for longer than 60 s — at which point the idle timer fires and
aborts the turn.

Fix:
1. Touch `turnCompletionLastActivityAt` and `turnAttemptLastProgressAt`
   at *enqueue* time (when readline delivers the line) so the idle timer
   sees live binary I/O even when the queue is backed up.
2. Guard all three idle-timeout handlers (`fireTurnCompletionIdleTimeout`,
   `fireTurnAttemptIdleTimeout`, `fireTurnTerminalIdleTimeout`) with the
   existing `terminalTurnNotificationQueued` flag so they never fire when
   `turn/completed` has already been received but not yet processed.

Closes openclaw#86948

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 26, 2026, 7:19 PM ET / 23:19 UTC.

Summary
The PR changes extensions/codex/src/app-server/run-attempt.ts so active-turn notifications update idle-watch activity at enqueue time and terminal queued notifications suppress the Codex idle timeout abort handlers.

PR surface: Source +18. Total +18 across 1 file.

Reproducibility: yes. Source inspection and the PR's copied live logs give a high-confidence reproduction path: current main waits for processed notification activity while turn/completed can be received and queued behind setImmediate dispatch past the 60s idle threshold.

Review metrics: 2 noteworthy metrics.

  • Idle watchdog guard changes: 3 handlers changed. All Codex app-server idle abort paths now depend on queued terminal notification state, which is the availability-sensitive behavior reviewers should inspect.
  • Proof parser status: 1 required proof check failing. The copied logs are convincing for review, but the branch may still need exact proof-field formatting before GitHub will allow merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Put the existing proof under the exact real-behavior-proof field labels so the repository check passes.
  • Add a focused harness regression test for a queued turn/completed delayed beyond the idle threshold.

Risk before merge

  • Merging changes Codex app-server watchdog behavior: once a terminal turn notification is queued, the three idle timeout abort paths no longer fire, so maintainers should explicitly accept that queued terminal processing is allowed to finish rather than be killed by the watchdog.
  • The PR body contains substantively useful live log proof, but the repository Real behavior proof check is currently failing because the proof is not under the exact parsed field labels.
  • The diff does not add a focused regression test for a turn/completed notification queued longer than the idle threshold.

Maintainer options:

  1. Land With Availability Acceptance (recommended)
    Maintainers can accept the runtime tradeoff because the patch only suppresses aborts after an active-turn terminal notification has already been received and queued.
  2. Add Queue-Delay Regression Coverage
    Before merge, add a harness test that delays notification processing past the completion-idle threshold after turn/completed is enqueued and asserts the run does not time out.
  3. Repair Proof Check Formatting
    Before merge, move the existing before/after log evidence into the exact real-behavior-proof fields required by the repository check if branch protection requires that check to pass.

Next step before merge
No ClawSweeper repair lane is needed for the patch itself; maintainers need to review the availability tradeoff, resolve the proof-check formatting if required, and wait for or rerun CI.

Security
Cleared: The diff only changes Codex app-server timeout bookkeeping in TypeScript and does not touch dependencies, scripts, CI, secrets, permissions, packaging, or artifact execution paths.

Review details

Best possible solution:

Land the focused Codex app-server fix after maintainer availability review, ideally with exact proof-field formatting and a targeted queue-delay regression test or explicit acceptance that the live proof is enough for this race.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection and the PR's copied live logs give a high-confidence reproduction path: current main waits for processed notification activity while turn/completed can be received and queued behind setImmediate dispatch past the 60s idle threshold.

Is this the best way to solve the issue?

Yes. The proposed change is a narrow owner-boundary fix that reuses existing terminal-queued state and updates active-turn receipt timestamps instead of changing provider routing or fallback policy; a focused regression test would make it stronger.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 068d88c14261.

Label changes

Label justifications:

  • P1: The PR targets an urgent Codex runtime regression that can drop active GPT-5.5 turns under event-loop pressure.
  • merge-risk: 🚨 availability: The patch changes when Codex app-server watchdogs abort or allow a run to continue after terminal notification receipt.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body includes before/after copied runtime logs and monitor output showing false timeouts before the patch and a post-fix Codex TUI turn running past the 60s threshold with zero false timeouts.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes before/after copied runtime logs and monitor output showing false timeouts before the patch and a post-fix Codex TUI turn running past the 60s threshold with zero false timeouts.
Evidence reviewed

PR surface:

Source +18. Total +18 across 1 file.

View PR surface stats
Area Files Added Removed Net
Source 1 19 1 +18
Tests 0 0 0 0
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 1 19 1 +18

What I checked:

Likely related people:

  • steipete: Recent merged Codex app-server timeout and terminal notification work in this path is tied to commits authored by Peter Steinberger, including the timeout fallout fix and terminal notification premarking. (role: recent area contributor; confidence: high; commits: 5a7d5c6defe5, 4cbf616d3049; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/shared-client.ts, src/agents/pi-embedded-runner/run/failover-policy.ts)
  • joshavant: The queued notification macrotask yield that made receipt-time versus process-time ordering important landed in the merged notification projection PR authored by Josh Avant. (role: regression-provenance contributor; confidence: medium; commits: ea16a5e9e10c; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Velvet Test Hopper

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: guards the happy path.
Image traits: location artifact grotto; accessory little merge flag; palette violet, aqua, and starlight; mood focused; pose sitting proudly on a smooth stone; shell starlit enamel shell; lighting calm overcast light; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Velvet Test Hopper in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@steipete

Copy link
Copy Markdown
Contributor

Thanks for the focused fix. I used the useful part of this patch in #87096, but this PR also made the queued-terminal flag suppress the full turn-attempt watchdog.

That changes the failure mode from a false short idle timeout into a possible permanent run/lane hang if turn/completed is received while an earlier notification projection is stuck behind an awaited channel/progress callback. The replacement keeps the receive-time activity touch and suppresses only the short completion/terminal idle guards; the broader attempt watchdog remains independent so a wedged queue can still release the run.

Closing this in favor of #87096 with a regression test for terminal completion queued behind projection. Thanks again for finding the race.

@steipete steipete closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: XS status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants