Skip to content

fix(codex): clarify unsafe app-server completion stalls#87793

Open
slatem wants to merge 1 commit into
openclaw:mainfrom
slatem:fix-codex-terminal-event-release
Open

fix(codex): clarify unsafe app-server completion stalls#87793
slatem wants to merge 1 commit into
openclaw:mainfrom
slatem:fix-codex-terminal-event-release

Conversation

@slatem

@slatem slatem commented May 28, 2026

Copy link
Copy Markdown

Summary

  • clarify the replay-unsafe Codex app-server timeout message so users know OpenClaw intentionally did not auto-replay after tool activity
  • include completed item id/type/status/tool metadata in app-server idle-timeout diagnostics, matching the raw response diagnostics already emitted
  • add focused coverage for completed item diagnostic details and update timeout outcome expectations

Why

When Codex app-server emits completed tool/item notifications but never sends the terminal turn/completed event, OpenClaw currently reports that work may already have happened but does not explain that replay was intentionally suppressed to avoid duplicate side effects. This makes the incident look like a generic bot failure instead of a replay-safety guard. The extra diagnostic fields also make future reports easier to triage from trajectory/log data.

Tests

  • OPENCLAW_VITEST_INCLUDE_FILE=/tmp/openclaw-codex-tests.json node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts
  • pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/attempt-notifications.ts extensions/codex/src/app-server/attempt-notifications.test.ts extensions/codex/src/app-server/attempt-results.ts extensions/codex/src/app-server/attempt-results.test.ts extensions/codex/src/app-server/run-attempt.turn-watches.test.ts

Related: #87781 handles replay-safe app-server stalls. This PR covers the replay-unsafe/user-facing side-effect path.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:10 AM ET / 05:10 UTC.

Summary
The branch updates Codex app-server replay-unsafe timeout wording, adds completed-item metadata to timeout diagnostics, and adds focused tests.

PR surface: Source +9, Tests +29. Total +38 across 5 files.

Reproducibility: no. high-confidence live reproduction is included in this PR. Source inspection and focused tests establish the timeout/diagnostic path, but the PR body only provides unit and format commands.

Review metrics: none identified.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Add redacted terminal/log/live output, terminal screenshot, or recording showing the replay-unsafe timeout message and completed-item metadata after the patch.
  • Update the PR body with the proof so ClawSweeper can review it again; redact private paths, IPs, API keys, phone numbers, and non-public endpoints.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body and comments provide tests only; it still needs redacted terminal/log/live output, a terminal screenshot, or a recording showing the after-fix replay-unsafe timeout message and item metadata. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P2] No redacted real run yet shows the after-fix replay-unsafe Codex app-server stall surfacing the clarified timeout message and completed-item metadata.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow diagnostics/message update after redacted real-run proof demonstrates the replay-unsafe timeout path; keep broader replay-safe recovery in fix(codex): prevent false completion stalls during native streams #87781 separate.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Contributor action is required for real after-fix proof; there is no concrete code defect for an automated repair PR to fix.

Security
Cleared: The diff only changes Codex app-server TypeScript helpers/tests and diagnostic text; no dependency, workflow, secret, or supply-chain surface is changed.

Review details

Best possible solution:

Land the narrow diagnostics/message update after redacted real-run proof demonstrates the replay-unsafe timeout path; keep broader replay-safe recovery in #87781 separate.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction is included in this PR. Source inspection and focused tests establish the timeout/diagnostic path, but the PR body only provides unit and format commands.

Is this the best way to solve the issue?

Yes for the code shape: the patch localizes the wording and diagnostic enrichment to Codex app-server helpers with focused tests. It is not merge-ready until real behavior proof confirms the runtime path.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0e86ca135225.

Label changes

Label justifications:

  • P2: This is a normal-priority Codex app-server diagnostic and user-facing timeout-message bugfix with limited blast radius.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body and comments provide tests only; it still needs redacted terminal/log/live output, a terminal screenshot, or a recording showing the after-fix replay-unsafe timeout message and item metadata. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +9, Tests +29. Total +38 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 2 12 3 +9
Tests 3 33 4 +29
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 5 45 7 +38

What I checked:

  • Repository policy read: Read the full root AGENTS.md and the scoped extensions/AGENTS.md; the proof gate and extension-boundary guidance apply to this Codex plugin PR. (AGENTS.md:1, 0e86ca135225)
  • Current main timeout message is still less explicit: Current main's side-effect timeout message warns that work may have happened but does not explain that OpenClaw intentionally avoided automatic replay. (extensions/codex/src/app-server/attempt-results.ts:10, 0e86ca135225)
  • Current main diagnostics omit item details for item/completed: Current main returns only lastNotificationMethod for non-rawResponseItem/completed notifications, so item/completed timeout diagnostics do not include item id/type/status/tool metadata. (extensions/codex/src/app-server/attempt-notifications.ts:27, 0e86ca135225)
  • Timeout outcome is surfaced to callers: runCodexAppServerAttempt attaches promptTimeoutOutcome to the final attempt result, so the edited message is user-visible on the timeout path. (extensions/codex/src/app-server/run-attempt.ts:2095, 0e86ca135225)
  • PR diff is focused: The PR patch changes the side-effect timeout string, adds item metadata extraction for notifications with params.item, and adds a focused attempt-notifications test. (extensions/codex/src/app-server/attempt-notifications.ts:24, 1f272b3dc3b9)
  • Real behavior proof remains missing: The live PR body lists unit and format commands only, and the existing ClawSweeper comment records that no redacted terminal/log/live output has shown the after-fix replay-unsafe timeout path. (1f272b3dc3b9)

Likely related people:

  • steipete: Central Codex app-server history shows Peter Steinberger introduced app-server controls and authored the module-split and lifecycle hardening commits around the affected timeout/result path. (role: feature-history owner; confidence: high; commits: 31a0b7bd42a5, 8d72aafdbb8d, 3b65e2302a55; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/attempt-results.ts, extensions/codex/src/app-server/attempt-notifications.ts)
  • vincentkoc: Recent history shows Vincent Koc changed Codex app-server startup/auth behavior in run-attempt.ts, adjacent to the affected runtime path. (role: recent adjacent contributor; confidence: medium; commits: f1cc8f0cfc7c, 859eb0666282; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 28, 2026
@RomneyDa

Copy link
Copy Markdown
Member

Heads up: this PR needs to be updated against current main before the new required Dependency Guard check can pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants