Skip to content

fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks#68853

Closed
grimmjoww wants to merge 3 commits intoopenclaw:mainfrom
grimmjoww:fix/gateway-restart-supervisor-handoff
Closed

fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks#68853
grimmjoww wants to merge 3 commits intoopenclaw:mainfrom
grimmjoww:fix/gateway-restart-supervisor-handoff

Conversation

@grimmjoww
Copy link
Copy Markdown

Summary

Addresses #40932 (gateway restart fails from exec tool) without reintroducing the Windows regression that caused closed PR #68507 to be reverted.

Background

PR #68507 tried to fix #40932 by adding a SIGUSR1 fast path to triggerOpenClawRestart(). Codex correctly flagged that as a P1 on Windows: the gateway run-loop always registers a SIGUSR1 listener, so the fast path fired when triggerOpenClawRestart was re-entered from restartGatewayProcessWithFreshPid's schtasks branch — bypassing relaunchGatewayScheduledTask, which is the only thing that tells the Windows Scheduled Task to start a new instance after exit. Gateway stopped instead of restarting.

The structural issue: triggerOpenClawRestart had two caller classes with contradictory needs.

  • Top-level callers (CLI, exec tool, any direct caller) want "restart however possible" — SIGUSR1 fast path is correct here.
  • Supervisor-handoff caller (restartGatewayProcessWithFreshPid schtasks branch) is already inside the SIGUSR1 drain/exit sequence and needs relaunchGatewayScheduledTask to fire before the process exits. Taking the SIGUSR1 fast path there coalesces with the in-flight restart and skips the Scheduled Task spawn.

Fix

Two changes that, together, restore the #40932 fix without the Windows regression:

  1. SIGUSR1 fast path in triggerOpenClawRestart — when a SIGUSR1 listener is registered (i.e. the gateway run-loop is up), route through scheduleGatewaySigusr1Restart instead of spawning systemctl / launchctl / schtasks. This is the original fix(gateway): try SIGUSR1 restart before spawning external processes #68507 change, now safe because of (2).
  2. Direct relaunchGatewayScheduledTask call in the schtasks branch of restartGatewayProcessWithFreshPid. Mirrors the existing launchd handling (which also bypasses triggerOpenClawRestart and relies on its supervisor-specific helper). Prevents the re-entry into the SIGUSR1 fast path that broke fix(gateway): try SIGUSR1 restart before spawning external processes #68507.

Tests

  • process-respawn.test.ts (updated) — mock renamed from triggerOpenClawRestartMock to relaunchGatewayScheduledTaskMock. Existing Windows-supervised test now asserts relaunchGatewayScheduledTask is called with process.env directly. New test covers schtasks failure propagation as mode: "failed".
  • restart-trigger.test.ts (new) — addresses Greptile's P2 from fix(gateway): try SIGUSR1 restart before spawning external processes #68507:
    • SIGUSR1 fast path returns method: "sigusr1" when a listener is registered (Linux)
    • SIGUSR1 fast path takes precedence on Windows too — the scheduled-task relaunch is handled by process-respawn.ts later in the drain/exit cycle, not here
    • Falls through to platform dispatch when no listener is registered

Verification

  • oxlint — clean (0 warnings, 0 errors on the 5 changed files)
  • oxfmt --check — clean
  • tsgo:core — clean
  • 19/19 tests pass in the two affected test files

No live Windows schtasks end-to-end repro was available on my machine; the fix is validated structurally — the schtasks branch of process-respawn.ts no longer routes through triggerOpenClawRestart, so the re-entry loop Codex flagged cannot happen.

AI-assisted

Drafted and implemented with Claude Code. Testing level: lightly tested (targeted unit tests pass; no live Windows schtasks end-to-end repro available). I reviewed the full call graph and understand the structural change before authoring this.

Refs: #40932, closed PR #68507

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 19, 2026

Greptile Summary

This PR restores the SIGUSR1 fast path for triggerOpenClawRestart (originally from closed #68507) while fixing the Windows regression: restartGatewayProcessWithFreshPid now calls relaunchGatewayScheduledTask directly in the schtasks branch instead of routing through triggerOpenClawRestart, eliminating the re-entry that caused the Scheduled Task spawn to be skipped. Tests are well-structured and cover the new mode: "failed" propagation and all three SIGUSR1 dispatch cases.

Confidence Score: 5/5

Safe to merge; the structural fix is sound and no live Windows repro gap introduces risk beyond what the PR author already disclosed.

All findings are P2. The logic of the two-change fix is correct, the tests cover the new paths, and the only note is a stale mock setup in one existing test that has no effect on correctness.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/process-respawn.test.ts
Line: 108-119

Comment:
**Dead mock setup in launchd test**

`relaunchGatewayScheduledTaskMock.mockReturnValue(...)` is configured in this test but the assertion immediately below is `expect(relaunchGatewayScheduledTaskMock).not.toHaveBeenCalled()`. The mock return value is never exercised, making the setup dead code. This carried over from the pre-PR version that used `triggerOpenClawRestartMock` to prove launchd never touches a failing restart — that framing made sense then, but against `relaunchGatewayScheduledTask` specifically the return value is unreachable. The test still passes and the core assertion is correct; only the stale mock setup is misleading.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(gateway): SIGUSR1 restart fast path ..." | Re-trigger Greptile

Comment thread src/infra/process-respawn.test.ts
Willie Stewart and others added 3 commits April 27, 2026 22:35
…htasks

Addresses openclaw#40932 (gateway restart fails from exec tool) without
introducing the Windows regression that caused closed PR openclaw#68507 to be
reverted.

Two structural changes:

* Add a SIGUSR1 fast path to `triggerOpenClawRestart()`. When a
  SIGUSR1 listener is registered (i.e. the gateway run-loop is up),
  route the restart through `scheduleGatewaySigusr1Restart` instead of
  spawning an external `systemctl` / `launchctl` / `schtasks` process.
  This restores the openclaw#40932 fix for direct callers such as the exec
  tool, which got no such protection previously.

* Route the Windows schtasks supervisor-handoff in
  `restartGatewayProcessWithFreshPid` directly to
  `relaunchGatewayScheduledTask`, mirroring the launchd handling.
  Going through `triggerOpenClawRestart` from that codepath would
  re-enter the SIGUSR1 fast path from inside the SIGUSR1 drain/exit
  sequence, skipping `relaunchGatewayScheduledTask` — which is the
  only thing that tells the Windows Scheduled Task to start a new
  instance after exit. The gateway would stop instead of restart.
  This was Codex's P1 on closed PR openclaw#68507.

Tests:
* `process-respawn.test.ts`: update the Windows supervised test to
  assert `relaunchGatewayScheduledTask` is called directly with
  `process.env`; add a test covering schtasks failure propagation.
* `restart-trigger.test.ts` (new): cover the SIGUSR1 fast path on
  linux and win32, plus the platform-dispatch fall-through when no
  listener is registered (addresses Greptile's P2 from openclaw#68507).

AI-assisted: drafted and implemented with Claude Code. Testing level:
lightly tested (targeted unit tests pass; no live Windows schtasks
repro available from this machine — the fix is validated structurally
by ensuring the schtasks branch of process-respawn no longer re-enters
`triggerOpenClawRestart`).

Refs: openclaw#40932, closed openclaw#68507

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the `relaunchGatewayScheduledTaskMock.mockReturnValue(...)` call
in the "launchd supervisor never calls the Windows scheduled-task
relaunch" test. The mock return value is unreachable because the
assertion immediately below is `not.toHaveBeenCalled()` — the function
under test never reaches that mock, so configuring its return value
was dead code carried over from when this test was written against
`triggerOpenClawRestartMock`.

The semantics of the test are preserved: launchd-supervised darwin must
not invoke the scheduled-task relaunch path. That assertion is what
actually matters; the stale mock setup just made it look like the test
was exercising failure injection, which it isn't.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vincentkoc vincentkoc force-pushed the fix/gateway-restart-supervisor-handoff branch from c555642 to ce61aaf Compare April 27, 2026 22:40
@vincentkoc
Copy link
Copy Markdown
Member

ProjectClownfish pushed a narrow repair to this branch so the original contributor path can stay canonical.

Source PR: #68853
Validation: pnpm -s vitest run src/infra/process-respawn.test.ts src/infra/restart-trigger.test.ts src/infra/restart.test.ts; pnpm check:changed
Contributor credit is preserved in the branch history and PR context.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs maintainer review before merge.

What this changes:

The PR adds a SIGUSR1 fast path to triggerOpenClawRestart(), changes Windows schtasks-supervised restart/update handoff to call relaunchGatewayScheduledTask() directly, updates restart types/tests, and adds a changelog entry.

Maintainer follow-up before merge:

This is an open implementation PR with cross-platform restart semantics and no narrow blocking review finding for an automated repair lane; the next action is maintainer rebase/review against current main and adjacent Windows restart work.

Security review:

Security review cleared: The diff changes gateway restart control flow, tests, and changelog text, and does not add CI, dependency, lockfile, package publishing, secret handling, or new third-party code execution surface.

Review details

Best possible solution:

Rebase this PR onto current main, reconcile it with the merged #69056 Windows fallback and the open Windows restart-health follow-up #73889, then land the focused direct schtasks handoff and trigger fast path if maintainers accept that as the intended restart contract.

Do we have a high-confidence way to reproduce the issue?

Unclear: code inspection gives a high-confidence structural reproduction of the current-main gap, because the schtasks branch still routes through triggerOpenClawRestart() and the trigger has no SIGUSR1 fast path. There is still no live Windows Scheduled Task end-to-end reproduction for the regression scenario attached to this PR.

Is this the best way to solve the issue?

Unclear: the two-part split is narrow and plausible, and I found no definite blocking bug in the diff. It still needs maintainer review after rebase because the current call graph already has Windows SIGUSR1 fallback behavior from #69056 and related restart-health work remains open in #73889.

Acceptance criteria:

  • pnpm test src/infra/process-respawn.test.ts src/infra/restart-trigger.test.ts src/infra/restart.test.ts
  • pnpm check:changed

What I checked:

  • Current main lacks trigger fast path: triggerOpenClawRestart() on current main goes from test-mode handling directly to stale PID cleanup and platform dispatch; there is no process.listenerCount("SIGUSR1") fast path or method: "sigusr1" return. (src/infra/restart.ts:540, 4babd925c40d)
  • Current Windows schtasks handoff still re-enters restart trigger: Both supervised Windows branches still call triggerOpenClawRestart() from process-respawn.ts, which is exactly the seam the PR changes to call the scheduled-task helper directly. (src/infra/process-respawn.ts:51, 4babd925c40d)
  • Current restart attempt type has no sigusr1 method: RestartAttempt.method on current main is limited to launchctl, systemd, schtasks, and supervisor, matching the absence of the PR's new sigusr1 trigger result. (src/infra/restart.types.ts:3, 4babd925c40d)
  • Merged adjacent Windows fallback does not supersede the PR: Current emitGatewayRestart() already emits SIGUSR1 through the run-loop on Windows when a listener exists and falls back to triggerOpenClawRestart() otherwise, but that is separate from adding a fast path inside triggerOpenClawRestart() or changing process-respawn.ts schtasks handoff. (src/infra/restart.ts:270, 4babd925c40d)
  • Current tests still assert the old schtasks seam: The current Windows supervised respawn test still mocks triggerOpenClawRestart() and expects it to be called for schtasks-supervised restart, while the PR changes that expectation to the scheduled-task helper. (src/infra/process-respawn.test.ts:179, 4babd925c40d)
  • PR review context addressed prior bot note: Greptile's dead mock setup review comment was followed by PR commit 23d9a9e49fd3c2bb6da7651ef060a796d34366c1, and the provided PR context says ProjectClownfish later pushed a narrow repair commit ce61aaf8a5bec7a14402543d97befcb2b0bc0ad6. (src/infra/process-respawn.test.ts:108, ce61aaf8a5be)

Likely related people:

  • steipete: Local history shows repeated recent work in gateway restart behavior, including fix(gateway): use launchd KeepAlive restarts; current blame for the affected files is dominated by Peter Steinberger due the current grafted main snapshot. (role: recent gateway restart maintainer; confidence: high; commits: a65ab607c746, 680eff63fbf8, 0640db72b059; files: src/infra/restart.ts, src/infra/process-respawn.ts, src/cli/gateway-cli/run-loop.ts)
  • vincentkoc: The PR discussion says vincentkoc pushed the latest narrow repair commit to this branch, and local history shows Vincent Koc split restart attempt types in the same restart type surface. (role: recent maintainer and branch repair owner; confidence: high; commits: ce61aaf8a5be, 3d60ed0544e3; files: src/infra/restart.types.ts, src/infra/process-respawn.ts, src/infra/restart.ts)
  • Ayaan Zaidi: Local history identifies commit 05c240fad6fea1caed99ddf8b2d093eff4ee1f31 as introducing Windows gateway restart via Scheduled Task, which is the platform handoff protected by this PR. (role: Windows scheduled-task restart introducer; confidence: medium; commits: 05c240fad6fe; files: src/infra/process-respawn.ts, src/infra/windows-task-restart.ts, src/daemon/schtasks.ts)
  • Thatgfsj: The related merged fix(gateway): handle SIGUSR1 gracefully on Windows #69056 context credits Thatgfsj with the Windows SIGUSR1 fallback work that is now present on current main and must be reconciled with this PR. (role: adjacent Windows SIGUSR1 fallback author; confidence: medium; commits: 4b8f248a1247; files: src/infra/restart.ts, src/infra/infra-runtime.test.ts)

Remaining risk / open question:

  • No live Windows schtasks end-to-end reproduction is attached; the PR validation is structural unit coverage plus targeted checks from the PR body.
  • The PR base predates current main and the merged fix(gateway): handle SIGUSR1 gracefully on Windows #69056 Windows fallback; a rebase may require careful review of restart semantics and changelog placement.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4babd925c40d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] openclaw gateway restart command fails and Gateway does not restart automatically (macOS LaunchAgent)

2 participants