Skip to content

fix: /restart uses via_service=True on launchd (macOS)#33393

Open
zhan9168 wants to merge 1 commit into
NousResearch:mainfrom
zhan9168:fix/restart-launchd-exit-code
Open

fix: /restart uses via_service=True on launchd (macOS)#33393
zhan9168 wants to merge 1 commit into
NousResearch:mainfrom
zhan9168:fix/restart-launchd-exit-code

Conversation

@zhan9168

Copy link
Copy Markdown

Problem

On macOS, Hermes Gateway runs under launchd. When a user sends /restart, the gateway checks for INVOCATION_ID (systemd) or container files to decide whether to use service-aware restart (exit code 75) vs detached restart (exit code 0).

launchd does not set INVOCATION_ID and macOS is not a container, so /restart fell through to the detached path — SystemExit(0) — which launchd does not restart from.

Fix

Detect launchd via XPC_SERVICE_NAME, which launchd injects into all managed jobs. When present, use via_service=True so the gateway exits with code 75 and launchd's KeepAlive → SuccessfulExit: false policy restarts it immediately.

_under_launchd = bool(os.environ.get("XPC_SERVICE_NAME"))  # launchd sets this on macOS
if _under_service or _in_container or _under_launchd:
    self.request_restart(detached=False, via_service=True)

Verification

Manual test confirmed: /restart now triggers launchd auto-restart on macOS.

macOS launchd does not set INVOCATION_ID, so /restart was falling
through to the detached restart path (exit code 0) instead of the
service-aware restart path (exit code 75).

Use XPC_SERVICE_NAME as the launchd indicator — it is set by launchd
for all managed jobs and survives across process lifetimes.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #19940 — same fix: detect launchd via XPC_SERVICE_NAME in gateway/run.py _under_service probe. Also duplicated by #29181, #24898, #24954 (all flagged as dups of #19940).

@chazmaniandinkle

Copy link
Copy Markdown

Confirming this fix from production use, not just a manual test.

We've been running the exact same two-line change (XPC_SERVICE_NAME added to the _under_service probe) on a macOS node since early June, across two gateway profiles managed by separate launchd jobs that share one checkout. Before the patch, /restart exited 0 and launchd left the gateway down. After it, /restart exits 75 and KeepAlive { SuccessfulExit: false } relaunches it every time, on both the default and a named-profile gateway.

XPC_SERVICE_NAME is the right signal. launchd injects it into every managed job, so it's reliably present for any hermes gateway install deployment and absent in foreground/dev runs, which is exactly the discrimination you want. No false positives observed.

One adjacent gap worth a separate fix (not this PR): hermes update on macOS only restarts the active profile's launchd label, so other profile gateways keep running pre-update code until restarted by hand. That's #38053. This PR is the correct fix for the /restart path itself.

@chazmaniandinkle

Copy link
Copy Markdown

Follow-up to my earlier confirmation: the handler this patches has since moved to gateway/slash_commands.py, and we found a hazard in the bare bool(os.environ.get("XPC_SERVICE_NAME")) probe in production: interactive macOS shells inherit XPC_SERVICE_NAME=0, a truthy string, so an unsupervised hermes gateway run in a terminal would get routed to the service path and exit 75 with nothing to revive it. Opened #43888 with the probe relocated to the current handler, the "0" value excluded, and regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants