Skip to content

Avoid double-starting the macOS gateway after launchd plist refresh#5110

Open
RaviTharuma wants to merge 2 commits into
NousResearch:mainfrom
RaviTharuma:fix/avoid-double-start-after-launchd-refresh
Open

Avoid double-starting the macOS gateway after launchd plist refresh#5110
RaviTharuma wants to merge 2 commits into
NousResearch:mainfrom
RaviTharuma:fix/avoid-double-start-after-launchd-refresh

Conversation

@RaviTharuma

@RaviTharuma RaviTharuma commented Apr 4, 2026

Copy link
Copy Markdown

Summary

  • avoid issuing a second launchctl start after a plist refresh already performed unload + load
  • prevent duplicate local gateway instances during macOS launchd refresh flows

Why

The Hermes launchd plist uses RunAtLoad. Reloading the plist already starts the gateway again, so an immediate extra launchctl start can race a second local gateway process into existence and trigger token/session lock conflicts.

Fixes #5109

Verification

  • python -m py_compile hermes_cli/gateway.py
  • manual macOS stop/start lifecycle verification

Not run

  • no automated launchd regression test in CI yet

On macOS the gateway plist uses RunAtLoad. Refreshing the plist unloads and reloads the job, which already starts the gateway again. Calling launchctl start immediately after that can race a second local gateway process into existence and create token/session lock conflicts across Telegram, Discord, and WhatsApp.\n\nConstraint: launchd reload semantics differ from systemd and the Hermes plist has RunAtLoad enabled.\nRejected: Keep the extra launchctl start for symmetry with the non-refresh path | can spawn duplicate local gateway instances.\nConfidence: medium\nScope-risk: narrow\nDirective: If refresh logic changes, preserve the rule that unload+load with RunAtLoad must not be followed by an unconditional second start.\nTested: python -m py_compile hermes_cli/gateway.py; manual stop/start lifecycle on macOS\nNot-tested: Automated launchd regression test in CI
@trevorgordon981

Copy link
Copy Markdown

LGTM. Clean, correct, targeted fix for a real race.

Verified the logic:

  1. refresh_launchd_plist_if_needed() returns True only when it has actually written a new plist AND called launchctl unload + launchctl load
  2. The Hermes plist has RunAtLoad=true, so launchctl load auto-starts the service
  3. The subsequent explicit launchctl start was redundant in the refresh path — at best a no-op, at worst spawning a second process before the first one finished binding sockets

The fix correctly gates on the return value. Comment explains the reasoning. Nothing to add to the logic itself.

One tiny defensive improvement to consider:

refresh_launchd_plist_if_needed() uses check=False on both the unload and load subprocess calls:

subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
subprocess.run(["launchctl", "load", str(plist_path)], check=False)

It returns True if it attempted the refresh, not if the refresh actually succeeded. Worst case: launchctl load fails for some reason (missing permissions, corrupt plist, whatever), refresh_launchd_plist_if_needed() returns True anyway, and this PR's early return prints "✓ Service started" — but no service is running.

Low probability in practice, and matches the existing behavior of the function. Worth noting for a future iteration, but not a blocker.

Ship it. This is exactly the kind of race I was debugging with my own launchd services earlier today — the combination of KeepAlive + RunAtLoad + explicit start commands is a minefield, and every small fix like this helps.

refresh_launchd_plist_if_needed() now returns False when launchctl load
fails, so launchd_start() falls through to the normal error-handled
start path instead of falsely printing 'Service started'.
@RaviTharuma

Copy link
Copy Markdown
Author

Good catch — addressed in 2b95c2b. refresh_launchd_plist_if_needed() now captures the launchctl load return code and returns False on failure, so launchd_start() falls through to the normal error-handled start path instead of falsely claiming success.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard labels May 1, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Fixes #5109. Related to macOS launchd cluster: #12374, #5589.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

launchd gateway refresh can start a second local gateway on macOS

3 participants