Skip to content

fix(codex): bound app-server timeout fallout#87022

Merged
steipete merged 1 commit into
mainfrom
codex-timeout-boundary-86948
May 26, 2026
Merged

fix(codex): bound app-server timeout fallout#87022
steipete merged 1 commit into
mainfrom
codex-timeout-boundary-86948

Conversation

@steipete

Copy link
Copy Markdown
Contributor

Summary

  • Detach timed-out Codex app-server clients from shared reuse, then best-effort interrupt, unsubscribe, and close them after active-turn timeouts.
  • Treat plugin-harness-owned timeout failures as harness lifecycle failures instead of auth-profile or fallback-model evidence.
  • Add regression coverage for timeout cleanup, shared-client detach, and harness-owned timeout failover decisions.

Refs #86948.

Verification

  • pnpm test src/agents/pi-embedded-runner/run/failover-policy.test.ts -- --reporter=verbose
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts --configLoader runner extensions/codex/src/app-server/run-attempt.test.ts extensions/codex/src/app-server/shared-client.test.ts --reporter=verbose
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts --configLoader runner extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose
  • pnpm check:changed via Blacksmith Testbox tbx_01ksjy9hvhe1700kv8sh1ahvmn
  • .agents/skills/autoreview/scripts/autoreview --mode local clean

Real behavior proof

Behavior addressed: Codex app-server turns that time out after being accepted no longer leave the same shared app-server client available for later turns, and harness-owned timeout failures no longer mark auth profiles failed or trigger model fallback.

Real environment tested: Local macOS checkout for targeted Vitest; Blacksmith Testbox for pnpm check:changed typecheck/lint/guards.

Exact steps or command run after this patch: See Verification commands above.

Evidence after fix: Codex app-server timeout regression test now expects turn/interrupt, thread/unsubscribe, and client close. Failover-policy tests cover harness-owned prompt, assistant, idle, and classified timeout paths.

Observed result after fix: Targeted Vitest passed; final Testbox check:changed exited 0; final autoreview reported no accepted/actionable findings.

What was not tested: Live Codex provider reproduction of issue #86948; reporter should still retry latest main/beta to confirm the original event-loop saturation scenario is gone.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: M maintainer Maintainer-authored PR labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 26, 2026, 5:38 PM ET / 21:38 UTC.

Summary
The PR adds lease-aware shared Codex app-server clients, retires and unsubscribes timed-out app-server turns, releases shared-client leases across Codex helper paths, and tightens harness-owned timeout failover decisions with regression coverage.

PR surface: Source +282, Tests +185. Total +467 across 17 files.

Reproducibility: no. high-confidence live reproduction was run in this review. The linked issue and current source identify the timeout/failover path, but the PR author still says a live Codex account timeout against a real app-server session was not exercised.

Review metrics: 1 noteworthy metric.

  • Timeout policy surfaces: 2 changed. The PR changes both Codex app-server client cleanup and embedded-runner auth/fallback decisions, so runtime proof matters more than unit-only validation.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Add redacted live Codex-provider proof showing the timeout cleanup and auth/fallback behavior after a real app-server timeout.
  • Keep the PR body updated with the proof so ClawSweeper can re-review automatically; if it does not, ask a maintainer to comment @clawsweeper re-review.

Proof guidance:
Needs real behavior proof before merge: The PR provides targeted tests, autoreview, and Testbox output only; it explicitly says the live Codex app-server timeout scenario was not exercised, so real behavior proof is still needed before merge with private details redacted. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • Live behavior proof is still missing: the latest proof is targeted tests, autoreview, and Testbox checks, while the live Codex account timeout/event-loop saturation path was explicitly not exercised.
  • The PR changes runtime recovery by retiring shared Codex app-server clients after timed-out turns, which could affect availability if active leases, retries, or follow-up turns behave differently in a real saturated app-server.
  • The PR also changes auth-profile rotation/provider fallback decisions for harness-owned timeout classifications, so maintainers need to own the routing policy before merge.

Maintainer options:

  1. Prove the live timeout path before merge (recommended)
    Add redacted live Codex-provider logs, terminal output, or linked artifact proof showing timeout cleanup retires the wedged client and leaves auth/fallback routing correct after the reported failure mode.
  2. Accept test-only proof as a maintainer decision
    Maintainers can intentionally land with the current targeted tests and Testbox gate, but should record that the linked live saturation scenario still needs post-merge confirmation.
  3. Pause if retirement policy is not owned
    If closing shared app-server clients after timeout is not the desired recovery contract, pause this PR and track the remaining live leak/root-cause work on the linked beta-blocker issue.

Next step before merge
Human maintainer review is needed because the PR has a protected maintainer label, changes availability/auth-provider behavior, and lacks live runtime proof; no narrow code defect was found for an automated repair lane.

Security
Cleared: No dependency, workflow, lockfile, package-script, permission, secret-handling, or supply-chain change was found; auth/fallback changes are functional merge risk rather than a new security concern.

Review details

Best possible solution:

Land only after maintainer approval of the timeout-retirement policy and redacted live Codex-provider proof showing a timed-out app-server session no longer poisons auth/fallback state or reuses a wedged shared client.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was run in this review. The linked issue and current source identify the timeout/failover path, but the PR author still says a live Codex account timeout against a real app-server session was not exercised.

Is this the best way to solve the issue?

Unclear until live proof and maintainer policy approval exist. The lease-aware retirement path is a plausible narrow fix, but it changes availability and auth/fallback behavior in a timeout recovery path.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a818556dd9c2.

Label changes

Label justifications:

  • P1: The PR targets a beta-blocking Codex runtime failure that can silently drop turns and poison auth/fallback state for real users.
  • merge-risk: 🚨 auth-provider: The diff changes when harness-owned timeout failures rotate auth profiles or trigger provider/model fallback.
  • merge-risk: 🚨 availability: The diff retires shared Codex app-server clients after active-turn timeouts, affecting recovery and runtime availability after timeout events.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR provides targeted tests, autoreview, and Testbox output only; it explicitly says the live Codex app-server timeout scenario was not exercised, so real behavior proof is still needed before merge with private details redacted. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +282, Tests +185. Total +467 across 17 files.

View PR surface stats
Area Files Added Removed Net
Source 10 553 271 +282
Tests 7 192 7 +185
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 17 745 278 +467

What I checked:

  • Repository policy read: Root AGENTS.md and scoped extensions/src/agents policies were read; their proof, protected-label, plugin-boundary, and fallback-risk guidance affects this review. (AGENTS.md:1, a818556dd9c2)
  • Current main timeout behavior: Current main's active-turn abort listener only best-effort interrupts the Codex turn and resolves completion; it does not retire the shared client on timeout. (extensions/codex/src/app-server/run-attempt.ts:3231, a818556dd9c2)
  • PR adds timeout retirement: The PR head changes the abort listener to call a timeout-retirement helper, which interrupts, unsubscribes, and closes or retires the shared app-server client after timed-out turns. (extensions/codex/src/app-server/run-attempt.ts:3256, 5bc2feee20d0)
  • PR adds shared-client leasing: The PR head adds leased shared-client acquisition/release plus delayed close for retired clients with active leases, which is central to bounding timeout cleanup without closing in-use clients. (extensions/codex/src/app-server/shared-client.ts:113, 5bc2feee20d0)
  • PR changes failover policy: The PR head treats harness-owned assistant failures classified with failoverReason "timeout" as harness-owned timeouts, avoiding auth rotation/fallback when there is no concrete non-timeout failure. (src/agents/pi-embedded-runner/run/failover-policy.ts:119, 5bc2feee20d0)
  • Related issue remains open: The canonical linked beta-blocker issue remains open with labels for message loss, auth-provider, crash-loop, needs-maintainer-review, and needs-live-repro, so this PR should stay open as an implementation candidate rather than be cleanup-closed.

Likely related people:

  • pashpashpash: Commit 3a64dc7 changed the current-main Codex timeout/failover policy that this PR extends and partly revises. (role: recent timeout-policy contributor; confidence: high; commits: 3a64dc762385; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run/failover-policy.ts, src/agents/pi-embedded-runner/run/failover-policy.test.ts)
  • steipete: Current-main history shows Peter Steinberger authored multiple Codex app-server lifecycle and embedded-runner failover commits in the affected paths, and this PR also comes from that branch author. (role: feature-history owner and PR author; confidence: high; commits: 659bcc5e5b59, 545490c5920d, 9ac7a0398213; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/shared-client.ts, src/agents/pi-embedded-runner/run/failover-policy.ts)
  • Andy Ye: A nearby current-main Codex app-server resume fix touched run-attempt.ts shortly before this PR, making this a useful adjacent routing signal. (role: recent adjacent contributor; confidence: medium; commits: bf0228b5c2b2; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@steipete steipete force-pushed the codex-timeout-boundary-86948 branch from 7f62935 to 5bc2fee Compare May 26, 2026 21:31
@steipete

Copy link
Copy Markdown
Contributor Author

Behavior addressed: Codex app-server turn timeouts now best-effort interrupt/unsubscribe and retire the shared client with lease-aware cleanup; harness-owned timeout classifications no longer rotate/fallback unless a concrete retryable reason is present.

Real environment tested: local macOS targeted Vitest runs plus Blacksmith Testbox changed gate.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts --configLoader runner extensions/codex/src/app-server/shared-client.test.ts extensions/codex/src/app-server/run-attempt.test.ts extensions/codex/src/app-server/compact.test.ts extensions/codex/src/app-server/request.test.ts extensions/codex/src/app-server/side-question.test.ts extensions/codex/src/conversation-control.test.ts extensions/codex/src/conversation-binding.test.ts --reporter=verbose
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts --configLoader runner extensions/codex/src/app-server/shared-client.test.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts --configLoader runner extensions/codex/src/app-server/shared-client.test.ts --reporter=verbose
  • pnpm test src/agents/pi-embedded-runner/run/failover-policy.test.ts -- --reporter=verbose
  • /Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • pnpm check:changed

Evidence after fix: targeted Codex extension suites passed (313 tests before the second-retirement regression fix; then 249 shared/run-attempt tests and 20 shared-client tests after the fix). Failover-policy suite passed (31 tests). Autoreview reported clean with no accepted/actionable findings after the lease call-path and factory fixes. pnpm check:changed passed on Testbox tbx_01ksk303jap8ya50tntdgw3seh / Actions run 26476184845.

Observed result after fix: timeout retirement no longer closes an actively leased shared app-server client immediately, second retirement attempts on an already-retired leased client stay lease-aware, and default non-run-attempt factory callers do not leak leases.

What was not tested: live Codex account timeout against a real app-server session was not exercised; proof is unit/integration-style targeted tests plus changed gate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant