Skip to content

fix(codex): yield app-server notification projection#82333

Merged
joshavant merged 2 commits into
mainfrom
fix/codex-notification-yield-81936
May 15, 2026
Merged

fix(codex): yield app-server notification projection#82333
joshavant merged 2 commits into
mainfrom
fix/codex-notification-yield-81936

Conversation

@joshavant

Copy link
Copy Markdown
Contributor

Summary

  • Yield Codex app-server projector work to a macrotask between embedded-run notifications.
  • Preserve pre-turn notification capture so rate-limit/error notifications remain available when turn/start fails.
  • Add focused regression coverage for queued notification projection.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -t "yields a macrotask|preserves Codex usage-limit reset details"
  • node scripts/run-oxlint.mjs extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts

Behavior addressed: Codex app-server account and MCP status notifications no longer project synchronously through the embedded-run notification queue while still preserving pre-turn rate-limit capture.
Real environment tested: Local targeted Codex app-server harness tests in this checkout.
Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -t "yields a macrotask|preserves Codex usage-limit reset details"; node scripts/run-oxlint.mjs extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
Evidence after fix: The targeted Vitest command passed 2 tests with 119 skipped; oxlint reported 0 warnings and 0 errors.
Observed result after fix: Notification projection yields to the event loop after synchronous queue bookkeeping, and the usage-limit test preserves rate-limit reset details when turn/start fails.
What was not tested: Live Codex OAuth/MCP gateway run, Telegram delivery, AWS health checks, and a production cron-triggered embedded_run.

Fixes #81936

@clawsweeper

clawsweeper Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR adds a macrotask yield before Codex app-server notification projection, preserves pre-turn notification capture, adds focused run-attempt regression coverage, and updates the changelog.

Reproducibility: no. high-confidence live reproduction was run in this review. The linked issue gives concrete production liveness logs and current source confirms the queued notification projection path, so this is source-reproducible rather than fully reproduced here.

Real behavior proof
Needs real behavior proof before merge: The PR body only reports targeted harness tests and oxlint, so after-fix real runtime proof with private details redacted is still needed before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
The protected maintainer label and missing real behavior proof make this a human merge decision; no narrow automated repair finding was identified.

Security
Cleared: The diff touches Codex app-server TypeScript, a focused test, and changelog only; no security-sensitive or supply-chain surface changed.

Review details

Best possible solution:

Land a narrow Codex app-server notification-yield fix after maintainer review and live or diagnostic proof that account and MCP status notifications no longer stall gateway liveness.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was run in this review. The linked issue gives concrete production liveness logs and current source confirms the queued notification projection path, so this is source-reproducible rather than fully reproduced here.

Is this the best way to solve the issue?

Yes, this is a narrow maintainable direction for the reported queue starvation: keep pre-turn capture synchronous and yield before projector work. The missing piece is real runtime proof that the liveness symptom improves under the reported Codex/MCP or cron path.

Acceptance criteria:

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -t "yields a macrotask|preserves Codex usage-limit reset details"
  • node scripts/run-oxlint.mjs extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
  • Live or diagnostic Codex app-server proof for account/rateLimits/updated and/or mcpServer/startupStatus/updated without gateway liveness regression

What I checked:

  • PR diff: The patch adds waitForCodexNotificationDispatchTurn() using setImmediate, awaits it before projector.handleNotification(notification), and short-circuits pre-turn notifications into pendingNotifications before the queued projector path. (extensions/codex/src/app-server/run-attempt.ts:1319, 9437eb990235)
  • Current main notification queue: Current main registers enqueueNotification, chains notifications through notificationQueue.then(...), and awaits projector.handleNotification(notification) inside that queued handler. (extensions/codex/src/app-server/run-attempt.ts:1239, 0ad3d25fb7cd)
  • Client dispatch contract: The app-server client invokes notification handlers via Promise.resolve(handler(notification)) without awaiting them, so the per-run queue is the local serialization point for projection work. (extensions/codex/src/app-server/client.ts:467, 0ad3d25fb7cd)
  • Rate-limit projection path: The projector currently special-cases account/rateLimits/updated by storing the payload in latestRateLimits and the shared rate-limit cache before returning. (extensions/codex/src/app-server/event-projector.ts:172, 0ad3d25fb7cd)
  • Regression coverage in PR: The diff adds a focused test asserting a rate-limit notification is not projected after one microtask but is projected after the queued processing promise resolves, plus a turn/start failure rate-limit preservation adjustment. (extensions/codex/src/app-server/run-attempt.test.ts:1986, 9437eb990235)
  • Real behavior proof gate: The PR body reports targeted harness tests and oxlint only, and explicitly says live Codex OAuth/MCP gateway runs, Telegram delivery, AWS health checks, and production cron-triggered embedded runs were not tested. (9437eb990235)

Likely related people:

  • steipete: The app-server protocol bridge commit touched client.ts, run-attempt.ts, protocol, tests, and Codex command surfaces; recent run-attempt history also shows multiple app-server timeout and harness fixes. (role: original app-server bridge and recent area contributor; confidence: high; commits: 69566e43cb8b, 934fc6ceeb10, 1e31bd2ac29d; files: extensions/codex/src/app-server/client.ts, extensions/codex/src/app-server/run-attempt.ts)
  • pashpashpash: The usage-limit reset details commit added rate-limit-cache.ts and touched event-projector.ts, run-attempt.ts, and their tests; recent event-projector work also touched projection behavior. (role: rate-limit and event-projector contributor; confidence: medium; commits: b2c3202a15e7, 1a5548203e09, 78eb92e62277; files: extensions/codex/src/app-server/event-projector.ts, extensions/codex/src/app-server/rate-limit-cache.ts, extensions/codex/src/app-server/run-attempt.ts)
  • jalehman: Recent history shows work on app-server surrogate stalls in client.ts and MCP server preservation in the app-server harness, both adjacent to this notification-liveness report. (role: recent Codex app-server client and MCP-adjacent contributor; confidence: medium; commits: f64feab47a66, 1ee0d51e92f6; files: extensions/codex/src/app-server/client.ts, extensions/codex/src/app-server/run-attempt.ts)
  • zknicker: A same-day merged change touched both central files and their tests for Codex app-server prompt mirror behavior, making this a useful routing signal for current-main context. (role: recent run-attempt/event-projector contributor; confidence: medium; commits: 7715b29aa231; files: extensions/codex/src/app-server/event-projector.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)

Remaining risk / open question:

  • The PR body does not include after-fix proof from a live Codex OAuth/MCP gateway, production-like cron run, or AWS/liveness environment.
  • The focused regression test proves queue yielding for a synthetic rate-limit notification, but does not exercise a real mcpServer/startupStatus/updated payload or gateway health checks.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0ad3d25fb7cd.

@joshavant joshavant merged commit ea16a5e into main May 15, 2026
83 of 85 checks passed
@joshavant joshavant deleted the fix/codex-notification-yield-81936 branch May 15, 2026 21:53
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(codex): yield app-server notification projection

* docs(changelog): note codex notification yield fix
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(codex): yield app-server notification projection

* docs(changelog): note codex notification yield fix
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(codex): yield app-server notification projection

* docs(changelog): note codex notification yield fix
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(codex): yield app-server notification projection

* docs(changelog): note codex notification yield fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: @openclaw/codex notification handlers (account/rateLimits/updated, mcpServer/startupStatus/updated) synchronously block Node event loop

1 participant