Skip to content

fix(gateway): try SIGUSR1 restart before spawning external processes#68507

Closed
grimmjoww wants to merge 1 commit intoopenclaw:mainfrom
grimmjoww:fix/gateway-restart-sigusr1
Closed

fix(gateway): try SIGUSR1 restart before spawning external processes#68507
grimmjoww wants to merge 1 commit intoopenclaw:mainfrom
grimmjoww:fix/gateway-restart-sigusr1

Conversation

@grimmjoww
Copy link
Copy Markdown

See the diff

The openclaw gateway restart command fails when called from inside the gateway
because triggerOpenClawRestart() goes straight to platform-specific external
restart (systemd/launchctl/schtasks), which spawns child processes that die
when the parent gateway dies.

The /restart text command already handles this correctly by trying SIGUSR1
first (in-process restart). This change makes the CLI path behave the same.

Fixes: openclaw#40932 (gateway restart fails from exec tool)
Fixes: openclaw#62442 (session state loss after restart)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67c398f860

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/infra/restart.ts
Comment on lines +302 to +304
if (process.listenerCount("SIGUSR1") > 0) {
scheduleGatewaySigusr1Restart({ reason: "cli" });
return { ok: true, method: "sigusr1", tried: [] };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep schtasks restart path when SIGUSR1 listeners exist

This early return bypasses platform-specific restart logic whenever a SIGUSR1 listener is registered, which is true inside the gateway run loop. In the Windows supervised path (restartGatewayProcessWithFreshPid), that means triggerOpenClawRestart reports success via scheduled in-process SIGUSR1, the process exits as supervised, and the scheduled timer never fires; because relaunchGatewayScheduledTask is skipped, the Scheduled Task restart is not triggered and the gateway can stop instead of restarting.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

This PR adds an early-return to triggerOpenClawRestart() that checks for registered SIGUSR1 listeners and, when found, delegates to scheduleGatewaySigusr1Restart() instead of spawning external processes (launchctl, systemctl, etc.). It also extends the RestartAttempt.method union with "sigusr1". The fix addresses the structural problem of external child processes dying when their parent gateway exits before they complete.

Confidence Score: 5/5

Safe to merge; logic is sound and the only finding is missing test coverage for the new branch.

The SIGUSR1 listener-count guard correctly identifies in-gateway context (Node's --inspect handler is OS-level and invisible to listenerCount, so there's no false-positive risk). The RestartAttempt type union is properly extended. cleanStaleGatewayProcessesSync is correctly skipped on the in-process path. scheduleGatewaySigusr1Restart always returns ok:true so dropping its return value is harmless. The single P2 finding (no unit test for the new branch) does not affect runtime correctness.

src/infra/restart.ts — new SIGUSR1 branch in triggerOpenClawRestart lacks a unit test.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/restart.ts
Line: 301-305

Comment:
**No test coverage for the new SIGUSR1 early-return branch**

`restart.test.ts` only exercises the stale-PID helpers, and `process-respawn.test.ts` mocks `triggerOpenClawRestart` entirely, so the new branch is never exercised by the test suite. A minimal unit test could register a temporary `SIGUSR1` listener on `process`, call `triggerOpenClawRestart()`, assert that the result has `method === "sigusr1"` and `ok === true`, and verify `scheduleGatewaySigusr1Restart` was called (mocked). This matches the pattern recommended by CLAUDE.md for narrowly scoped changes.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(gateway): try SIGUSR1 restart before..." | Re-trigger Greptile

Comment thread src/infra/restart.ts
Comment on lines +301 to +305
// without the structural problem of spawning external processes that die with parent.
if (process.listenerCount("SIGUSR1") > 0) {
scheduleGatewaySigusr1Restart({ reason: "cli" });
return { ok: true, method: "sigusr1", tried: [] };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for the new SIGUSR1 early-return branch

restart.test.ts only exercises the stale-PID helpers, and process-respawn.test.ts mocks triggerOpenClawRestart entirely, so the new branch is never exercised by the test suite. A minimal unit test could register a temporary SIGUSR1 listener on process, call triggerOpenClawRestart(), assert that the result has method === "sigusr1" and ok === true, and verify scheduleGatewaySigusr1Restart was called (mocked). This matches the pattern recommended by CLAUDE.md for narrowly scoped changes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/restart.ts
Line: 301-305

Comment:
**No test coverage for the new SIGUSR1 early-return branch**

`restart.test.ts` only exercises the stale-PID helpers, and `process-respawn.test.ts` mocks `triggerOpenClawRestart` entirely, so the new branch is never exercised by the test suite. A minimal unit test could register a temporary `SIGUSR1` listener on `process`, call `triggerOpenClawRestart()`, assert that the result has `method === "sigusr1"` and `ok === true`, and verify `scheduleGatewaySigusr1Restart` was called (mocked). This matches the pattern recommended by CLAUDE.md for narrowly scoped changes.

How can I resolve this? If you propose a fix, please make it concise.

@grimmjoww
Copy link
Copy Markdown
Author

Closing this PR — Codex's P1 is correct: on Windows, the gateway run-loop always registers a SIGUSR1 listener, so the early-return fires when triggerOpenClawRestart() is re-entered from restartGatewayProcessWithFreshPid → schtasks branch, which means relaunchGatewayScheduledTask never runs and the scheduled task isn't told to relaunch the process after exit.

After re-reading the call graph, the root cause is structural, not platform-specific: triggerOpenClawRestart has two caller classes with contradictory needs, and the schtasks branch of restartGatewayProcessWithFreshPid should bypass it and go directly to relaunchGatewayScheduledTask (the same pattern launchd already uses via scheduleDetachedLaunchdRestartHandoff). Opening a revised PR with that fix + the unit test Greptile asked for.

Thanks to @codex and @greptile-bot for the review.

@grimmjoww grimmjoww closed this Apr 19, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 19, 2026

Thanks for the thorough write-up — the call graph analysis makes the structural issue clear. The launchd parallel (scheduleDetachedLaunchdRestartHandoff bypassing triggerOpenClawRestart) is a clean model for the schtasks fix. Looking forward to reviewing the revised PR.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

vincentkoc pushed a commit to grimmjoww/waifuclaw that referenced this pull request Apr 27, 2026
…htasks

Addresses openclaw#40932 (gateway restart fails from exec tool) without
introducing the Windows regression that caused closed PR openclaw#68507 to be
reverted.

Two structural changes:

* Add a SIGUSR1 fast path to `triggerOpenClawRestart()`. When a
  SIGUSR1 listener is registered (i.e. the gateway run-loop is up),
  route the restart through `scheduleGatewaySigusr1Restart` instead of
  spawning an external `systemctl` / `launchctl` / `schtasks` process.
  This restores the openclaw#40932 fix for direct callers such as the exec
  tool, which got no such protection previously.

* Route the Windows schtasks supervisor-handoff in
  `restartGatewayProcessWithFreshPid` directly to
  `relaunchGatewayScheduledTask`, mirroring the launchd handling.
  Going through `triggerOpenClawRestart` from that codepath would
  re-enter the SIGUSR1 fast path from inside the SIGUSR1 drain/exit
  sequence, skipping `relaunchGatewayScheduledTask` — which is the
  only thing that tells the Windows Scheduled Task to start a new
  instance after exit. The gateway would stop instead of restart.
  This was Codex's P1 on closed PR openclaw#68507.

Tests:
* `process-respawn.test.ts`: update the Windows supervised test to
  assert `relaunchGatewayScheduledTask` is called directly with
  `process.env`; add a test covering schtasks failure propagation.
* `restart-trigger.test.ts` (new): cover the SIGUSR1 fast path on
  linux and win32, plus the platform-dispatch fall-through when no
  listener is registered (addresses Greptile's P2 from openclaw#68507).

AI-assisted: drafted and implemented with Claude Code. Testing level:
lightly tested (targeted unit tests pass; no live Windows schtasks
repro available from this machine — the fix is validated structurally
by ensuring the schtasks branch of process-respawn no longer re-enters
`triggerOpenClawRestart`).

Refs: openclaw#40932, closed openclaw#68507

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant