Skip to content

fix(codex): unsubscribe bound threads after turns#85496

Open
leno23 wants to merge 1 commit into
openclaw:mainfrom
leno23:leno23/fix-codex-bound-thread-unsubscribe-85458
Open

fix(codex): unsubscribe bound threads after turns#85496
leno23 wants to merge 1 commit into
openclaw:mainfrom
leno23:leno23/fix-codex-bound-thread-unsubscribe-85458

Conversation

@leno23

@leno23 leno23 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Tests

  • node scripts/run-vitest.mjs extensions/codex/src/conversation-binding.test.ts
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt-thread-cleanup.test.ts
  • git diff --check
  • node_modules/.bin/oxfmt --check extensions/codex/src/conversation-binding.ts extensions/codex/src/conversation-binding.test.ts

Real behavior proof

Behavior addressed: Codex app-server conversation bindings now best-effort unsubscribe the bound thread after every turn attempt, including the missing-thread recovery retry path, so a recreated bound thread is not left subscribed when the retry rejects.
Real environment tested: Local macOS OpenClaw checkout on the PR branch, using the real Codex app-server conversation-binding runtime entrypoint with an isolated temporary HOME/CODEX_HOME and a stdio app-server process that records JSON-RPC calls.
Exact steps or command run after this patch: node --import tsx .tmp-codex-proof.ts.
Evidence after fix: Copied terminal output from the runtime command:

{
  "result": {
    "handled": true,
    "reply": {
      "text": "Codex app-server turn failed: retry failed after recovery"
    }
  },
  "calls": [
    { "id": 2, "method": "turn/start", "threadId": "thread-old" },
    { "id": 3, "method": "thread/unsubscribe", "threadId": "thread-old" },
    { "id": 4, "method": "thread/start" },
    { "id": 5, "method": "turn/start", "threadId": "thread-new" },
    { "id": 6, "method": "thread/unsubscribe", "threadId": "thread-new" }
  ],
  "savedThreadId": "thread-new"
}

Observed result after fix: The command exited 0 and returned Codex app-server turn failed: retry failed after recovery, with both old and replacement thread unsubscribe calls recorded in order.
What was not tested: No live Codex hosted session was used; the proof isolates auth and exercises the OpenClaw runtime entrypoint against a local stdio app-server process, plus the targeted Vitest coverage listed above.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed June 5, 2026, 5:57 AM ET / 09:57 UTC.

Summary
The PR adds best-effort Codex app-server thread/unsubscribe cleanup after bound conversation turn attempts and regression coverage for missing-thread recovery retry failures.

PR surface: Source +26, Tests +96. Total +122 across 2 files.

Reproducibility: yes. from source inspection: current main’s bound conversation turn path can finish or fail without sending thread/unsubscribe. I did not run a live Codex session; the PR supplies a focused runtime proof and regression test for the failure path.

Review metrics: none identified.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🦞 diamond lobster
Patch quality: 🦐 gold shrimp
Result: needs maintainer review before merge.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Rebase on current main and resolve the dirty merge state.
  • Replace the local bound-thread unsubscribe helper with the existing app-server cleanup helper.

Risk before merge

  • [P1] The PR is currently dirty against main, so it cannot be merged until the branch is rebased or conflicts are resolved.
  • [P1] Current main already has a generic Codex app-server unsubscribe helper; keeping the PR’s local helper after rebase would create duplicate cleanup logic and drift risk.

Maintainer options:

  1. Decide the mitigation before merge
    Rebase onto current main, call the shared app-server unsubscribe helper from the bound-turn finally path, and keep the new recovery-failure regression coverage.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] The PR needs contributor or maintainer branch refresh against current main before normal review can continue; the code issue is narrow, but the live merge state is dirty.

Security
Cleared: The diff changes Codex runtime cleanup and tests only; no dependency, workflow, secret, package, or supply-chain surface is introduced.

Review findings

  • [P1] Reuse the existing app-server unsubscribe helper — extensions/codex/src/conversation-binding.ts:512-528
Review details

Best possible solution:

Rebase onto current main, call the shared app-server unsubscribe helper from the bound-turn finally path, and keep the new recovery-failure regression coverage.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection: current main’s bound conversation turn path can finish or fail without sending thread/unsubscribe. I did not run a live Codex session; the PR supplies a focused runtime proof and regression test for the failure path.

Is this the best way to solve the issue?

No as submitted against current main: the cleanup belongs in runBoundTurn’s finally, but the best current shape is to reuse unsubscribeCodexThreadBestEffort and CODEX_APP_SERVER_UNSUBSCRIBE_TIMEOUT_MS from the existing app-server cleanup module.

Full review comments:

  • [P1] Reuse the existing app-server unsubscribe helper — extensions/codex/src/conversation-binding.ts:512-528
    Current main already exports unsubscribeCodexThreadBestEffort and CODEX_APP_SERVER_UNSUBSCRIBE_TIMEOUT_MS from extensions/codex/src/app-server/attempt-client-cleanup.ts. Keeping this local copy after rebase creates a second cleanup contract and log path that can drift from the ordinary app-server attempt cleanup.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.84

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1a3ce7c2a8da.

Label changes

Label changes:

  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes after-fix terminal output from a local runtime entrypoint with isolated HOME/CODEX_HOME and a stdio app-server process recording the expected turn/start, thread/start, and thread/unsubscribe sequence.
  • remove rating: 🦞 diamond lobster: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P2: This is a normal-priority Codex cleanup bugfix for a latent bound-thread subscription leak with limited blast radius.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes after-fix terminal output from a local runtime entrypoint with isolated HOME/CODEX_HOME and a stdio app-server process recording the expected turn/start, thread/start, and thread/unsubscribe sequence.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a local runtime entrypoint with isolated HOME/CODEX_HOME and a stdio app-server process recording the expected turn/start, thread/start, and thread/unsubscribe sequence.
Evidence reviewed

PR surface:

Source +26, Tests +96. Total +122 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 29 3 +26
Tests 1 103 7 +96
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 132 10 +122

What I checked:

Likely related people:

  • steipete: Recent commits touched the Codex conversation-binding path, introduced the current app-server cleanup helper split, and carried bound app-server timeout cleanup. (role: recent area contributor; confidence: high; commits: 6868cde4d45f, a4c2e7f5cf1b, 5a7d5c6defe5; files: extensions/codex/src/conversation-binding.ts, extensions/codex/src/app-server/attempt-client-cleanup.ts, extensions/codex/src/app-server/run-attempt.ts)
  • keshavbotagent: The missing Codex bound-thread recovery flow appears in the commit titled fix: recover missing Codex bound threads, which is the recovery path this PR extends with cleanup. (role: introduced recovery behavior; confidence: medium; commits: a373468d8252; files: extensions/codex/src/conversation-binding.ts, extensions/codex/src/conversation-binding.test.ts)
  • joshavant: Recent Codex binding and exec-policy commits modified the same conversation-binding behavior around approval, sandbox, and execution policy preservation. (role: adjacent owner; confidence: medium; commits: b5f8191887b9, e0405ecc9bd6, ba06376c7955; files: extensions/codex/src/conversation-binding.ts, extensions/codex/src/conversation-binding.test.ts)
  • vincentkoc: Recent history shows shared Codex thread-binding and provider-identity refactors in the same binding surface, useful for routing rebase conflicts. (role: recent refactor contributor; confidence: medium; commits: 815ffb3bb2e1, 4c33aaa86c16; files: extensions/codex/src/conversation-binding.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Clockwork Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: finds missing screenshots.
Image traits: location review cove; accessory tiny test log scroll; palette moss green and polished brass; mood calm; pose nestled inside a glowing shell; shell smooth pearl shell; lighting subtle sparkle highlights; background soft code-shaped tiles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Clockwork Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 22, 2026
@leno23 leno23 marked this pull request as draft May 26, 2026 15:49
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 26, 2026
@leno23 leno23 force-pushed the leno23/fix-codex-bound-thread-unsubscribe-85458 branch from 5d50a84 to f2405d0 Compare May 26, 2026 17:20
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 26, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 26, 2026
@steipete steipete self-assigned this May 28, 2026
@leno23 leno23 marked this pull request as ready for review June 5, 2026 09:49
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: S status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Potential resource leak in runBoundTurnWithMissingThreadRecovery -- retry not wrapped in try/finally after thread spawn

2 participants