Skip to content

[Bug]: Potential resource leak in runBoundTurnWithMissingThreadRecovery -- retry not wrapped in try/finally after thread spawn #85458

@joking100182

Description

@joking100182

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Source review of dist/node-cli-sessions-C0-r5Nk8.js (OpenClaw 2026.5.18) found that runBoundTurnWithMissingThreadRecovery spawns a new conversation thread via startCodexConversationThread and then awaits runBoundTurn for the retry without wrapping the retry in try/finally; if the retry awaits and throws, the freshly spawned thread has no cleanup path. Not observed firing at runtime in this deployment; reporting as a source-level finding for maintainer review.

Steps to reproduce

NOT_ENOUGH_INFO

Expected behavior

NOT_ENOUGH_INFO

Actual behavior

NOT_ENOUGH_INFO

OpenClaw version

2026.5.18 (installed); latest release at time of filing is 2026.5.20 -- source lines referenced are from the 2026.5.18 dist build

Operating system

Debian 12 (LXC container, Proxmox host)

Install method

npm global (/usr/lib/node_modules/openclaw)

Model

NOT_ENOUGH_INFO (finding is source-level, model-independent)

Provider / routing chain

NOT_ENOUGH_INFO (finding is source-level, routing-independent)

Additional provider/model setup details

NOT_ENOUGH_INFO (finding is source-level, provider-independent)

Logs, screenshots, and evidence

Impact and severity

NOT_ENOUGH_INFO (no observed runtime occurrence; latent defect reported from source review only)

Additional information

Source-level analysis of the suspect function (paths and line numbers refer to OpenClaw 2026.5.18 as installed):

File: /usr/lib/node_modules/openclaw/dist/node-cli-sessions-C0-r5Nk8.js
Function: runBoundTurnWithMissingThreadRecovery (approx lines 709-730)

Observed structure (paraphrased; exact bytes available on request):

  1. Receive a runBoundTurn invocation that signals missing-thread state.
  2. Call startCodexConversationThread(...) (approx lines 716-727) to spawn a replacement conversation thread; await its readiness.
  3. Invoke runBoundTurn(...) again as a retry (approx line 728), awaiting its result, and return it.

Leak vector:

  • Step 3 is awaited directly without a surrounding try/catch or try/finally.
  • If the awaited retry rejects (network error, plugin error, auth error, etc.), control unwinds out of runBoundTurnWithMissingThreadRecovery before any teardown of the thread spawned in step 2.
  • The reference to the freshly spawned thread is held only in the local frame, so the recovery caller has no handle to release it.
  • The result is an orphaned Codex conversation thread that consumes session-state slots until an external sweep reclaims it (if any).

Why this is latent and not currently observed:

  • The retry path only runs when missing-thread state is detected; under normal operation runBoundTurn never enters this branch.
  • Even when it enters, the retry must additionally throw to trigger the leak; in steady-state deployments this is rare.

Suggested fix area (illustrative, not a patch):

  • Wrap the retry in try/catch (or try/finally) so that on rejection the spawned thread can be released via the same path that handles normal end-of-turn cleanup.
  • Alternatively, hoist the spawned-thread handle so that an outer scope (e.g. the session-manager that invoked recovery) can attach it to its lifecycle.

Adjacent issues observed during duplicate search (cross-references, not duplicates):

Reporter notes:

  • I have no live reproduction; the orphan would manifest as a slowly-growing count of stranded Codex threads correlated with missing-thread recovery events and with subsequent retry errors. Recommend pairing this report with a counter or trace at the recovery entry so maintainers can detect occurrences in production.
  • I am happy to provide the exact byte range or a redacted snippet on request; full code body omitted here to keep the report concise.
  • Filed under the same scrutiny as the project's stated NOT_ENOUGH_INFO rule: this report is source-level only, no runtime evidence claimed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.bugSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions