fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks#68853
fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks#68853grimmjoww wants to merge 3 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR restores the SIGUSR1 fast path for Confidence Score: 5/5Safe to merge; the structural fix is sound and no live Windows repro gap introduces risk beyond what the PR author already disclosed. All findings are P2. The logic of the two-change fix is correct, the tests cover the new paths, and the only note is a stale mock setup in one existing test that has no effect on correctness. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/infra/process-respawn.test.ts
Line: 108-119
Comment:
**Dead mock setup in launchd test**
`relaunchGatewayScheduledTaskMock.mockReturnValue(...)` is configured in this test but the assertion immediately below is `expect(relaunchGatewayScheduledTaskMock).not.toHaveBeenCalled()`. The mock return value is never exercised, making the setup dead code. This carried over from the pre-PR version that used `triggerOpenClawRestartMock` to prove launchd never touches a failing restart — that framing made sense then, but against `relaunchGatewayScheduledTask` specifically the return value is unreachable. The test still passes and the core assertion is correct; only the stale mock setup is misleading.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(gateway): SIGUSR1 restart fast path ..." | Re-trigger Greptile |
…htasks Addresses openclaw#40932 (gateway restart fails from exec tool) without introducing the Windows regression that caused closed PR openclaw#68507 to be reverted. Two structural changes: * Add a SIGUSR1 fast path to `triggerOpenClawRestart()`. When a SIGUSR1 listener is registered (i.e. the gateway run-loop is up), route the restart through `scheduleGatewaySigusr1Restart` instead of spawning an external `systemctl` / `launchctl` / `schtasks` process. This restores the openclaw#40932 fix for direct callers such as the exec tool, which got no such protection previously. * Route the Windows schtasks supervisor-handoff in `restartGatewayProcessWithFreshPid` directly to `relaunchGatewayScheduledTask`, mirroring the launchd handling. Going through `triggerOpenClawRestart` from that codepath would re-enter the SIGUSR1 fast path from inside the SIGUSR1 drain/exit sequence, skipping `relaunchGatewayScheduledTask` — which is the only thing that tells the Windows Scheduled Task to start a new instance after exit. The gateway would stop instead of restart. This was Codex's P1 on closed PR openclaw#68507. Tests: * `process-respawn.test.ts`: update the Windows supervised test to assert `relaunchGatewayScheduledTask` is called directly with `process.env`; add a test covering schtasks failure propagation. * `restart-trigger.test.ts` (new): cover the SIGUSR1 fast path on linux and win32, plus the platform-dispatch fall-through when no listener is registered (addresses Greptile's P2 from openclaw#68507). AI-assisted: drafted and implemented with Claude Code. Testing level: lightly tested (targeted unit tests pass; no live Windows schtasks repro available from this machine — the fix is validated structurally by ensuring the schtasks branch of process-respawn no longer re-enters `triggerOpenClawRestart`). Refs: openclaw#40932, closed openclaw#68507 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the `relaunchGatewayScheduledTaskMock.mockReturnValue(...)` call in the "launchd supervisor never calls the Windows scheduled-task relaunch" test. The mock return value is unreachable because the assertion immediately below is `not.toHaveBeenCalled()` — the function under test never reaches that mock, so configuring its return value was dead code carried over from when this test was written against `triggerOpenClawRestartMock`. The semantics of the test are preserved: launchd-supervised darwin must not invoke the scheduled-task relaunch path. That assertion is what actually matters; the stale mock setup just made it look like the test was exercising failure injection, which it isn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c555642 to
ce61aaf
Compare
|
ProjectClownfish pushed a narrow repair to this branch so the original contributor path can stay canonical. Source PR: #68853 |
|
Codex review: needs maintainer review before merge. What this changes: The PR adds a SIGUSR1 fast path to Maintainer follow-up before merge: This is an open implementation PR with cross-platform restart semantics and no narrow blocking review finding for an automated repair lane; the next action is maintainer rebase/review against current main and adjacent Windows restart work. Security review: Security review cleared: The diff changes gateway restart control flow, tests, and changelog text, and does not add CI, dependency, lockfile, package publishing, secret handling, or new third-party code execution surface. Review detailsBest possible solution: Rebase this PR onto current main, reconcile it with the merged #69056 Windows fallback and the open Windows restart-health follow-up #73889, then land the focused direct schtasks handoff and trigger fast path if maintainers accept that as the intended restart contract. Do we have a high-confidence way to reproduce the issue? Unclear: code inspection gives a high-confidence structural reproduction of the current-main gap, because the schtasks branch still routes through Is this the best way to solve the issue? Unclear: the two-part split is narrow and plausible, and I found no definite blocking bug in the diff. It still needs maintainer review after rebase because the current call graph already has Windows SIGUSR1 fallback behavior from #69056 and related restart-health work remains open in #73889. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 4babd925c40d. |
Summary
Addresses #40932 (gateway restart fails from exec tool) without reintroducing the Windows regression that caused closed PR #68507 to be reverted.
Background
PR #68507 tried to fix #40932 by adding a SIGUSR1 fast path to
triggerOpenClawRestart(). Codex correctly flagged that as a P1 on Windows: the gateway run-loop always registers a SIGUSR1 listener, so the fast path fired whentriggerOpenClawRestartwas re-entered fromrestartGatewayProcessWithFreshPid's schtasks branch — bypassingrelaunchGatewayScheduledTask, which is the only thing that tells the Windows Scheduled Task to start a new instance after exit. Gateway stopped instead of restarting.The structural issue:
triggerOpenClawRestarthad two caller classes with contradictory needs.restartGatewayProcessWithFreshPidschtasks branch) is already inside the SIGUSR1 drain/exit sequence and needsrelaunchGatewayScheduledTaskto fire before the process exits. Taking the SIGUSR1 fast path there coalesces with the in-flight restart and skips the Scheduled Task spawn.Fix
Two changes that, together, restore the #40932 fix without the Windows regression:
triggerOpenClawRestart— when a SIGUSR1 listener is registered (i.e. the gateway run-loop is up), route throughscheduleGatewaySigusr1Restartinstead of spawningsystemctl/launchctl/schtasks. This is the original fix(gateway): try SIGUSR1 restart before spawning external processes #68507 change, now safe because of (2).relaunchGatewayScheduledTaskcall in the schtasks branch ofrestartGatewayProcessWithFreshPid. Mirrors the existing launchd handling (which also bypassestriggerOpenClawRestartand relies on its supervisor-specific helper). Prevents the re-entry into the SIGUSR1 fast path that broke fix(gateway): try SIGUSR1 restart before spawning external processes #68507.Tests
process-respawn.test.ts(updated) — mock renamed fromtriggerOpenClawRestartMocktorelaunchGatewayScheduledTaskMock. Existing Windows-supervised test now assertsrelaunchGatewayScheduledTaskis called withprocess.envdirectly. New test covers schtasks failure propagation asmode: "failed".restart-trigger.test.ts(new) — addresses Greptile's P2 from fix(gateway): try SIGUSR1 restart before spawning external processes #68507:method: "sigusr1"when a listener is registered (Linux)process-respawn.tslater in the drain/exit cycle, not hereVerification
oxlint— clean (0 warnings, 0 errors on the 5 changed files)oxfmt --check— cleantsgo:core— cleanNo live Windows schtasks end-to-end repro was available on my machine; the fix is validated structurally — the schtasks branch of
process-respawn.tsno longer routes throughtriggerOpenClawRestart, so the re-entry loop Codex flagged cannot happen.AI-assisted
Drafted and implemented with Claude Code. Testing level: lightly tested (targeted unit tests pass; no live Windows schtasks end-to-end repro available). I reviewed the full call graph and understand the structural change before authoring this.
Refs: #40932, closed PR #68507