fix(gateway): detect macOS launchd in service-restart path#29181
Open
zhonghui5207 wants to merge 1 commit into
Open
fix(gateway): detect macOS launchd in service-restart path#29181zhonghui5207 wants to merge 1 commit into
zhonghui5207 wants to merge 1 commit into
Conversation
The /restart command uses an environment-variable probe to decide
between two restart strategies:
- service-restart path: exit with code 75 so a service manager
(systemd / launchd) relaunches us
- detached-subprocess path: spawn a new gateway via setsid + bash
The probe only checked for systemd's INVOCATION_ID env var, so on
macOS launchd it always picked the detached path. The gateway then
exited with code 0, which launchd's KeepAlive { SuccessfulExit: false }
policy interprets as "stopped successfully — do not relaunch", leaving
the gateway down until manually bootstrapped.
Extend the probe to also recognise launchd by checking the
XPC_SERVICE_NAME env var (launchd injects this for managed jobs;
INVOCATION_ID is systemd-specific).
Tests:
- New test_restart_command_uses_service_restart_under_launchd verifies
that XPC_SERVICE_NAME triggers via_service=True.
- The existing detached-without-systemd test now also clears
XPC_SERVICE_NAME so it asserts the true "no service manager" case.
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes
/restart(and other_handle_restart_commandcode paths) so that they actually trigger a launchd-driven relaunch on macOS, instead of exiting cleanly and leaving the gateway down.Bug
After running
/restarton a macOS host where the gateway is managed by launchd, the gateway drains and exits with code 0. Because the launchd plist usesKeepAlive { SuccessfulExit: false }, launchd treats the clean exit as "stopped on purpose" and refuses to relaunch. The user has to manuallylaunchctl kickstart -kto bring the gateway back up. Same flow works correctly on Linux/systemd.Root cause
The gateway picks its restart strategy in
gateway/run.py(~line 9720):INVOCATION_IDis set only by systemd. macOS launchd usesXPC_SERVICE_NAME/XPC_FLAGSinstead — neverINVOCATION_ID. So on macOS,_under_serviceis alwaysFalse, the detached-subprocess branch is taken, andvia_service=Falseflows to the exit path:SystemExit(75)is the contract launchd recognises ("unsuccessful exit" → KeepAlive relaunches). Because the branch is skipped, the function returnsTrue→sys.exit(0), which launchd interprets as a successful, intentional stop. The gateway stays down.Fix
Extend the probe to recognise launchd as a service manager:
XPC_SERVICE_NAMEis injected by launchd for every managed job and is launchd-specific (no false positives on Linux). With this change,/restarton macOS takes thevia_service=Truebranch and exits with code 75, which launchd recognises and relaunches.Related Issue
Fixes #29180
Type of Change
Changes Made
gateway/run.py— extend the_under_serviceprobe to also recognise launchd (XPC_SERVICE_NAME)tests/gateway/test_restart_notification.pytest_restart_command_uses_service_restart_under_launchdasserts thatXPC_SERVICE_NAMEtriggersvia_service=Truetest_restart_command_uses_detached_without_systemdnow also clearsXPC_SERVICE_NAMEso it asserts the genuine "no service manager" caseHow to Test
Automated:
26 tests pass on macOS 15.4 / Python 3.11.14.
Manual (macOS):
hermes gateway install).launchctl list ai.hermes.gatewayshows a PID./restartto the bot.launchctl list ai.hermes.gatewayshowing a new PID within ~1 second (instead of staying on the stale one).~/.hermes/logs/gateway-exit-diag.logshould now contain aSystemExit code=75entry followed immediately by a newgateway.startentry.Checklist
Code
fix(gateway): …)pytest tests/gateway/test_restart_notification.py -vand all 26 tests passDocumentation & Housekeeping
cli-config.yaml.examplekeys touchedtest_restart_command_uses_service_restart_under_systemdstill passes)