Skip to content

fix(codex): backport shared app-server startup cleanup to .27#87428

Merged
steipete merged 2 commits into
release/2026.5.27from
backport/codex-shared-app-server-2026.5.27
May 27, 2026
Merged

fix(codex): backport shared app-server startup cleanup to .27#87428
steipete merged 2 commits into
release/2026.5.27from
backport/codex-shared-app-server-2026.5.27

Conversation

@steipete

Copy link
Copy Markdown
Contributor

Summary:

  • Backport fix(codex): preserve shared app-server after startup app errors #87399 to release/2026.5.27.
  • Preserve the single shared Codex app-server after logical thread/start startup errors in the older run-attempt startup path.
  • Keep cleanup for abandoned startup races: abort, startup timeout, and per-RPC startup timeout.
  • Keep the installed-plugin ledger cache fixture isolated for the release branch config test.

Behavior addressed: Codex helper/auth app-level startup failures no longer tear down the shared parent Codex app-server on the .27 release branch.
Real environment tested: Testbox through Crabbox, provider=blacksmith-testbox, id=tbx_01ksnprx3pgh6d3a9fqcf6nnkb.
Exact steps or command run after this patch: corepack pnpm install --frozen-lockfile && node scripts/run-vitest.mjs run extensions/codex/src/app-server/run-attempt.test.ts -t 'shared Codex client|thread/start connection close|logical thread/start error' && node scripts/run-vitest.mjs run --config test/vitest/vitest.runtime-config.config.ts src/config/config.plugin-validation.test.ts
Evidence after fix: Codex run-attempt focused tests passed: 1 file, 4 tests. Config plugin validation passed: 1 file, 37 tests. Autoreview clean: no accepted/actionable findings.
Observed result after fix: Logical thread/start errors preserve the shared client; connection-close and timeout startup paths still retire abandoned clients.
What was not tested: Live Codex auth failure against a real operator account; coverage uses mocked app-server startup plus CI-parity Testbox runtime.

Backport of #87399 / commit 7f7eca1.

* fix(codex): preserve shared app-server after startup app errors

* fix(codex): align startup cleanup tests with current types

* test(config): isolate installed plugin ledger cache

(cherry picked from commit 7f7eca1)
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 6:23 PM ET / 22:23 UTC.

Summary
Backports Codex shared app-server startup cleanup to release/2026.5.27, adjusts related run-attempt/config tests, and updates release-branch shrinkwrap/import compatibility helpers.

PR surface: Source +6, Tests +5, Other +49. Total +60 across 5 files.

Reproducibility: yes. from source inspection: the PR head only clears abort/timeout startup failures, while the app-server client can reject thread/start with raw write errors and the PR removes the write EPIPE regression case. I did not run tests because this review is read-only.

Review metrics: none identified.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Restore raw write EPIPE startup cleanup and its regression coverage on the release path.
  • Add redacted live terminal/log proof for the Codex helper/auth startup failure, or get explicit maintainer acceptance of that proof gap.

Proof guidance:
Needs real behavior proof before merge: The PR body reports Crabbox/Testbox Vitest runs with mocked app-server startup and explicitly says no live Codex auth/helper failure was tested, so real behavior proof is still needed or must be explicitly waived by maintainers. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • Merging this release-branch backport as-is can leave a shared Codex app-server cached after a raw startup pipe write failure, making later Codex attempts reuse a poisoned client instead of starting cleanly.
  • The PR body only provides mocked Testbox Vitest proof and explicitly does not test a live Codex helper/auth startup failure, so the real operator failure path remains unproved.
  • This is a protected maintainer-labeled release backport, so maintainers need to either repair the startup cleanup gap or explicitly accept the narrower behavior before merge.

Maintainer options:

  1. Backport the full startup-failure predicate (recommended)
    Add the current-main equivalent of shouldClearSharedClientAfterStartupFailure to the release run-attempt path and restore raw write EPIPE regression coverage before merge.
  2. Accept the narrower release fix
    Maintainers can intentionally ship only the logical app-error preservation and accept that raw startup write failures are not covered on this release branch.
  3. Pause for live startup proof
    Hold the release backport until redacted live Codex helper/auth startup proof shows the real failure path behaves as intended.

Next step before merge
Human review is needed because the PR has a concrete startup cleanup defect plus mock-only proof on a protected release backport.

Security
Cleared: The diff touches a shrinkwrap-generation script, but I found no new dependency source, download/execute path, secret exposure, or permission broadening beyond local override-merging logic.

Review findings

  • [P2] Keep clearing raw startup write failures — extensions/codex/src/app-server/run-attempt.ts:1853-1854
Review details

Best possible solution:

Backport the current-main raw startup-failure cleanup behavior into the release path, restore write EPIPE regression coverage, then merge only with live proof or explicit maintainer acceptance of the proof gap.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: the PR head only clears abort/timeout startup failures, while the app-server client can reject thread/start with raw write errors and the PR removes the write EPIPE regression case. I did not run tests because this review is read-only.

Is this the best way to solve the issue?

No. The right backport should preserve logical app-error clients while still clearing raw transport/write startup failures, matching the current-main follow-up behavior.

Full review comments:

  • [P2] Keep clearing raw startup write failures — extensions/codex/src/app-server/run-attempt.ts:1853-1854
    This catch now clears the shared startup client only for aborts or timeout-shaped errors. A thread/start pipe failure can still reject as a raw write EPIPE from writeMessage, and the PR changed the regression test away from that case; that leaves the broken shared client cached for later attempts. Please keep the logical app-error preservation, but also clear on raw startup transport/write failures as current main now does.
    Confidence: 0.92

Overall correctness: patch is incorrect
Overall confidence: 0.9

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5d437de70f95.

Label changes

Label changes:

  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦪 silver shellfish and patch quality is 🧂 unranked krab.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets a release-branch Codex startup regression where failed shared app-server cleanup can affect real agent availability.
  • merge-risk: 🚨 availability: The diff can leave a broken Codex app-server client cached after raw startup write failures, degrading recovery for later attempts.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦪 silver shellfish and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body reports Crabbox/Testbox Vitest runs with mocked app-server startup and explicitly says no live Codex auth/helper failure was tested, so real behavior proof is still needed or must be explicitly waived by maintainers. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +6, Tests +5, Other +49. Total +60 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 1 10 4 +6
Tests 3 11 6 +5
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 1 49 0 +49
Total 5 70 10 +60

What I checked:

  • PR head startup cleanup predicate: The PR head only clears the shared startup client when the run is aborted or the error matches startup timeout/abort text, so non-timeout transport write errors fall through without clearing the cached client. (extensions/codex/src/app-server/run-attempt.ts:1853, 43dc961db5da)
  • Removed raw write-failure regression coverage: The PR changes the release-branch regression test from a raw write EPIPE startup failure to the normalized codex app-server client is closed error, leaving the raw write path unproved. (extensions/codex/src/app-server/run-attempt.test.ts:12557, 43dc961db5da)
  • Client error contract: The app-server client only classifies codex app-server client is closed and codex app-server exited: as connection-closed errors, while writeMessage forwards raw write callback errors to the pending request. (extensions/codex/src/app-server/client.ts:80, 43dc961db5da)
  • Current-main follow-up behavior: Current main added shouldClearSharedClientAfterStartupFailure after the original shared-client cleanup and explicitly clears on write EPIPE, which is the behavior this release path should mirror. (extensions/codex/src/app-server/attempt-startup.ts:401, 7299c5695317)
  • History provenance: The raw write EPIPE cleanup regression test in this area dates to the earlier spawned-helper shared-client fix, so removing that exact case is a meaningful behavior change rather than only test wording. (extensions/codex/src/app-server/run-attempt.test.ts:3907, 74f9d6b96d4)
  • Proof evidence: The PR body reports Crabbox/Testbox Vitest runs against mocked app-server startup and explicitly says no live Codex auth failure against a real operator account was tested. (43dc961db5da)

Likely related people:

  • steipete: Authored the main shared app-server cleanup commit and this release-branch backport, and appears repeatedly in recent Codex app-server history. (role: recent area contributor; confidence: high; commits: 7f7eca1ad2ba, 3f03d5f5a594, 43dc961db5da; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/attempt-startup.ts, scripts/generate-npm-shrinkwrap.mjs)
  • Yuval Dinodia: Blame shows the prior spawned-helper shared-client cleanup and raw write EPIPE regression test came from the earlier related Codex app-server fix. (role: introduced related behavior; confidence: medium; commits: 74f9d6b96d4; files: extensions/codex/src/app-server/run-attempt.test.ts, extensions/codex/src/app-server/run-attempt.ts)
  • mbelinky: Recent current-main work added the startup-failure cleanup helper that explicitly keeps raw write EPIPE failures on the clear-client path. (role: recent adjacent owner; confidence: medium; commits: 7299c5695317; files: extensions/codex/src/app-server/attempt-startup.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f03d5f5a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1853 to 1854
if (runAbortController.signal.aborted || shouldClearSharedClientAfterStartupRace(error)) {
clearSharedCodexAppServerClientIfCurrent(startupClientForCleanup);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retire the shared client on startup write failures

When startup fails because thread/start rejects with a raw pipe write error such as write EPIPE, this new predicate is false, so the outer catch no longer clears startupClientForCleanup; the retry loop also only handles isCodexAppServerConnectionClosedError, which does not match write EPIPE. CodexAppServerClient.writeMessage forwards write callback errors directly, and the previous test for this case was changed to codex app-server client is closed, so a startup pipe failure can leave the shared client cached and cause later attempts to reuse the same failed process instead of starting a fresh app-server.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 27, 2026
@steipete steipete requested a review from a team as a code owner May 27, 2026 22:18
@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram scripts Repository scripts size: S and removed size: XS labels May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 27, 2026
@steipete steipete merged commit c3d3d68 into release/2026.5.27 May 27, 2026
132 of 138 checks passed
@steipete steipete deleted the backport/codex-shared-app-server-2026.5.27 branch May 27, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: telegram Channel integration: telegram extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. scripts Repository scripts size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant