Summary
On macOS, respawnGatewayProcessForUpdate() (and restartGatewayProcessWithFreshPid()) trusts detectRespawnSupervisor() to decide whether launchd will restart the gateway. The detector returns "launchd" if any of LAUNCH_JOB_LABEL, LAUNCH_JOB_NAME, XPC_SERVICE_NAME, or OPENCLAW_LAUNCHD_LABEL is set.
But XPC_SERVICE_NAME is inherited by any child process of a launchd-managed parent. When OpenClaw's GUI app (ai.openclaw.mac) spawns the gateway as a child — or any custom supervisor inherits launchd env from its own parent — the gateway misidentifies itself as launchd-supervised.
Result: gateway writes a gateway-supervisor-restart-handoff.json with supervisorMode: "launchd" and exits cleanly, expecting launchd to restart it. But launchd has no ai.openclaw.gateway service registered (only ai.openclaw.mac for the parent). The gateway never comes back.
Environment
- OpenClaw: confirmed in
2026.5.19 (where I first hit it) and verified still present in 2026.5.20 (currently installed) by reading dist/supervisor-markers-B5EgETF5.js and dist/cli/gateway-lifecycle.runtime.js.
- Node: 25.x
- OS: macOS 15 (Darwin 25.4)
- Trigger: any user running the OpenClaw GUI app whose
ai.openclaw.gateway LaunchAgent has been unloaded (e.g. by a prior mode=reload restart script that did launchctl bootout + a failed launchctl bootstrap, or by doctor's legacy-service cleanup). Trigger event: any SIGUSR1 / update.run restart, even a dry-run status=skipped one.
Code-level trace (against 2026.5.20)
dist/supervisor-markers-B5EgETF5.js:
const SUPERVISOR_HINTS = {
launchd: ["LAUNCH_JOB_LABEL", "LAUNCH_JOB_NAME", "XPC_SERVICE_NAME", "OPENCLAW_LAUNCHD_LABEL"],
// ...
};
function detectRespawnSupervisor(env = process.env, platform = process.platform) {
if (platform === "darwin") return hasAnyHint(env, SUPERVISOR_HINTS.launchd) ? "launchd" : null;
// ...
}
dist/cli/gateway-lifecycle.runtime.js:
function respawnGatewayProcessForUpdate(opts = {}) {
if (isTruthy(process.env.OPENCLAW_NO_RESPAWN)) return { mode: "disabled", detail: "OPENCLAW_NO_RESPAWN" };
const supervisor = detectRespawnSupervisor(process.env);
if (supervisor) {
if (supervisor === "schtasks") { /* ... */ }
return { mode: "supervised" }; // ← false positive on darwin
}
// fallback: spawnDetachedGatewayProcess(...)
}
Observed sequence
09:32:30 update.run dry-run: current 2026.5.19 → target 2026.5.20, status=skipped
09:32:47 gateway PID 84633 receives SIGUSR1
09:33:17 drain timeout (2 tasks + 1 embedded run still active)
09:33:18 shutdown completed cleanly; "restart mode: update process respawn (supervisor restart)"
→ writes handoff.json with supervisorMode=launchd, sleeps 1500ms, exit(0)
[gap] launchd does NOT restart anything (no ai.openclaw.gateway service registered)
09:33:21 OpenClaw GUI app fallback-spawns a new gateway child (PPID = GUI app, XPC inherited)
09:33:24 new gateway calls cleanStaleGatewayProcessesSync, kills PID 15647 (leftover on :18789)
09:33:57 another banner — yet another spawn, but hits the same bug, exits
[after] no further gateway log entries; gateway is gone for ~2.5h until I manually
`launchctl bootstrap`ed ai.openclaw.gateway.plist
Why this matters
Catastrophic and silent: the user's chat bots, agents, and integrations all go offline with no error visible to the gateway operator. Recovery requires CLI/launchctl knowledge to discover the service is unloaded.
Proposed fix
In src/infra/supervisor-markers.ts, narrow darwin detection to OpenClaw's own explicit marker so inherited generic launchd env vars don't trigger a false positive:
function detectRespawnSupervisor(env, platform) {
if (platform === "darwin") {
// Only trust the openclaw-specific marker; XPC_SERVICE_NAME and friends
// are inherited by any child of a launchd-managed process and do not
// mean *this* process is registered as a launchd service.
return env.OPENCLAW_LAUNCHD_LABEL?.trim() ? "launchd" : null;
}
// ...
}
For belt-and-suspenders: before returning "launchd", optionally verify the service is actually registered via launchctl print "gui/$(id -u)/$LABEL".
Operators who run gateway under launchd should ensure ai.openclaw.gateway.plist sets OPENCLAW_LAUNCHD_LABEL=ai.openclaw.gateway in its EnvironmentVariables. Worth adding this to the bundled plist generator too, so the marker is set by default.
Related (not duplicates)
Workaround
Set OPENCLAW_NO_RESPAWN=1 to force in-process restart (loses the fresh-module-graph benefit on real update.run upgrades, but survives spurious dry-run restarts).
Or: make sure ai.openclaw.gateway is bootstrapped into launchd before relying on update-triggered restarts.
Summary
On macOS,
respawnGatewayProcessForUpdate()(andrestartGatewayProcessWithFreshPid()) trustsdetectRespawnSupervisor()to decide whether launchd will restart the gateway. The detector returns"launchd"if any ofLAUNCH_JOB_LABEL,LAUNCH_JOB_NAME,XPC_SERVICE_NAME, orOPENCLAW_LAUNCHD_LABELis set.But
XPC_SERVICE_NAMEis inherited by any child process of a launchd-managed parent. When OpenClaw's GUI app (ai.openclaw.mac) spawns the gateway as a child — or any custom supervisor inherits launchd env from its own parent — the gateway misidentifies itself as launchd-supervised.Result: gateway writes a
gateway-supervisor-restart-handoff.jsonwithsupervisorMode: "launchd"and exits cleanly, expecting launchd to restart it. But launchd has noai.openclaw.gatewayservice registered (onlyai.openclaw.macfor the parent). The gateway never comes back.Environment
2026.5.19(where I first hit it) and verified still present in2026.5.20(currently installed) by readingdist/supervisor-markers-B5EgETF5.jsanddist/cli/gateway-lifecycle.runtime.js.ai.openclaw.gatewayLaunchAgent has been unloaded (e.g. by a priormode=reloadrestart script that didlaunchctl bootout+ a failedlaunchctl bootstrap, or by doctor's legacy-service cleanup). Trigger event: anySIGUSR1/update.runrestart, even a dry-runstatus=skippedone.Code-level trace (against 2026.5.20)
dist/supervisor-markers-B5EgETF5.js:dist/cli/gateway-lifecycle.runtime.js:Observed sequence
Why this matters
Catastrophic and silent: the user's chat bots, agents, and integrations all go offline with no error visible to the gateway operator. Recovery requires CLI/launchctl knowledge to discover the service is unloaded.
Proposed fix
In
src/infra/supervisor-markers.ts, narrow darwin detection to OpenClaw's own explicit marker so inherited generic launchd env vars don't trigger a false positive:For belt-and-suspenders: before returning
"launchd", optionally verify the service is actually registered vialaunchctl print "gui/$(id -u)/$LABEL".Operators who run gateway under launchd should ensure
ai.openclaw.gateway.plistsetsOPENCLAW_LAUNCHD_LABEL=ai.openclaw.gatewayin itsEnvironmentVariables. Worth adding this to the bundled plist generator too, so the marker is set by default.Related (not duplicates)
detectRespawnSupervisorreturningnullunder a custom supervisor, leading to orphan + EADDRINUSE. This issue isdetectRespawnSupervisorreturning"launchd"when it shouldn't.Workaround
Set
OPENCLAW_NO_RESPAWN=1to force in-process restart (loses the fresh-module-graph benefit on realupdate.runupgrades, but survives spurious dry-run restarts).Or: make sure
ai.openclaw.gatewayis bootstrapped into launchd before relying on update-triggered restarts.