-
-
Notifications
You must be signed in to change notification settings - Fork 79.2k
gateway restart/update can fail to come back when respawn reuses unstable package-manager paths #52313
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
Message-triggered
restart/update.runcan occasionally fail to bring the gateway back after shutdown.What is happening
The gateway run loop already tries to do a full fresh-process restart after
SIGUSR1, butrestartGatewayProcessWithFreshPid()currently respawns the child withprocess.execArgv + process.argv.slice(1).That is brittle when the running process was launched from a package-manager-managed realpath, especially pnpm versioned paths like:
node_modules/.pnpm/openclaw@<version>/node_modules/openclaw/dist/entry.jsDuring self-update, that versioned realpath may be replaced or removed. The parent exits cleanly, but the child can then be spawned against an entrypoint that is no longer stable, which matches the observed "message channel restart/update ran, process went down, did not come back" behavior.
Why this matters
This is most visible on message-triggered flows because the restart/update is initiated from the running gateway itself, so there is no external operator retrying the command.
Expected behavior
Gateway self-restart should respawn via a stable wrapper/symlinked CLI entrypoint that survives package updates, not by blindly reusing the current argv path.
Proposed fix
Before detached respawn:
node_modules/openclaw/openclaw.mjswrapper<packageRoot>/openclaw.mjssrc/entry.tsValidation
I have a fix prepared locally on top of the latest green
maincommit:52a0aa06723fbad5e7c2b0fc07fe04eef433d1c7pnpm exec vitest run src/infra/process-respawn.test.tspnpm exec oxlint --type-aware src/infra/process-respawn.ts src/infra/process-respawn.test.tsFull-repo
tsccurrently reports unrelated pre-existing test typing errors aroundfetchmocks, so I am not using that as the gating signal for this issue.