fix(codex): backport shared app-server startup cleanup to .27#87428
Conversation
* fix(codex): preserve shared app-server after startup app errors * fix(codex): align startup cleanup tests with current types * test(config): isolate installed plugin ledger cache (cherry picked from commit 7f7eca1)
|
Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 6:23 PM ET / 22:23 UTC. Summary PR surface: Source +6, Tests +5, Other +49. Total +60 across 5 files. Reproducibility: yes. from source inspection: the PR head only clears abort/timeout startup failures, while the app-server client can reject Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Backport the current-main raw startup-failure cleanup behavior into the release path, restore Do we have a high-confidence way to reproduce the issue? Yes from source inspection: the PR head only clears abort/timeout startup failures, while the app-server client can reject Is this the best way to solve the issue? No. The right backport should preserve logical app-error clients while still clearing raw transport/write startup failures, matching the current-main follow-up behavior. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5d437de70f95. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +6, Tests +5, Other +49. Total +60 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f03d5f5a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (runAbortController.signal.aborted || shouldClearSharedClientAfterStartupRace(error)) { | ||
| clearSharedCodexAppServerClientIfCurrent(startupClientForCleanup); |
There was a problem hiding this comment.
Retire the shared client on startup write failures
When startup fails because thread/start rejects with a raw pipe write error such as write EPIPE, this new predicate is false, so the outer catch no longer clears startupClientForCleanup; the retry loop also only handles isCodexAppServerConnectionClosedError, which does not match write EPIPE. CodexAppServerClient.writeMessage forwards write callback errors directly, and the previous test for this case was changed to codex app-server client is closed, so a startup pipe failure can leave the shared client cached and cause later attempts to reuse the same failed process instead of starting a fresh app-server.
Useful? React with 👍 / 👎.
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
Summary:
Behavior addressed: Codex helper/auth app-level startup failures no longer tear down the shared parent Codex app-server on the .27 release branch.
Real environment tested: Testbox through Crabbox, provider=blacksmith-testbox, id=tbx_01ksnprx3pgh6d3a9fqcf6nnkb.
Exact steps or command run after this patch: corepack pnpm install --frozen-lockfile && node scripts/run-vitest.mjs run extensions/codex/src/app-server/run-attempt.test.ts -t 'shared Codex client|thread/start connection close|logical thread/start error' && node scripts/run-vitest.mjs run --config test/vitest/vitest.runtime-config.config.ts src/config/config.plugin-validation.test.ts
Evidence after fix: Codex run-attempt focused tests passed: 1 file, 4 tests. Config plugin validation passed: 1 file, 37 tests. Autoreview clean: no accepted/actionable findings.
Observed result after fix: Logical thread/start errors preserve the shared client; connection-close and timeout startup paths still retire abandoned clients.
What was not tested: Live Codex auth failure against a real operator account; coverage uses mocked app-server startup plus CI-parity Testbox runtime.
Backport of #87399 / commit 7f7eca1.