fix(codex): clarify unsafe app-server completion stalls#87793
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:10 AM ET / 05:10 UTC. Summary PR surface: Source +9, Tests +29. Total +38 across 5 files. Reproducibility: no. high-confidence live reproduction is included in this PR. Source inspection and focused tests establish the timeout/diagnostic path, but the PR body only provides unit and format commands. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the narrow diagnostics/message update after redacted real-run proof demonstrates the replay-unsafe timeout path; keep broader replay-safe recovery in #87781 separate. Do we have a high-confidence way to reproduce the issue? No high-confidence live reproduction is included in this PR. Source inspection and focused tests establish the timeout/diagnostic path, but the PR body only provides unit and format commands. Is this the best way to solve the issue? Yes for the code shape: the patch localizes the wording and diagnostic enrichment to Codex app-server helpers with focused tests. It is not merge-ready until real behavior proof confirms the runtime path. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 0e86ca135225. Label changesLabel justifications:
Evidence reviewedPR surface: Source +9, Tests +29. Total +38 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Heads up: this PR needs to be updated against current |
Summary
Why
When Codex app-server emits completed tool/item notifications but never sends the terminal
turn/completedevent, OpenClaw currently reports that work may already have happened but does not explain that replay was intentionally suppressed to avoid duplicate side effects. This makes the incident look like a generic bot failure instead of a replay-safety guard. The extra diagnostic fields also make future reports easier to triage from trajectory/log data.Tests
OPENCLAW_VITEST_INCLUDE_FILE=/tmp/openclaw-codex-tests.json node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.tspnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/attempt-notifications.ts extensions/codex/src/app-server/attempt-notifications.test.ts extensions/codex/src/app-server/attempt-results.ts extensions/codex/src/app-server/attempt-results.test.ts extensions/codex/src/app-server/run-attempt.turn-watches.test.tsRelated: #87781 handles replay-safe app-server stalls. This PR covers the replay-unsafe/user-facing side-effect path.