Skip to content

fix(cli): complete launchd restart after self-restart signal#36240

Open
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/launchd-restart-kickstart-after-sigusr1
Open

fix(cli): complete launchd restart after self-restart signal#36240
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/launchd-restart-kickstart-after-sigusr1

Conversation

@konsisumer

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a macOS gateway restart regression where launchd_restart() returned immediately after sending SIGUSR1 when the gateway PID was an ancestor of the caller (for example when hermes update ran inside the gateway process tree). This change removes the early-exit behavior for that path so restart consistently completes with a launchd kickstart.

Related Issue

Fixes #11932

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • hermes_cli/gateway.py: in launchd_restart(), replaced the SIGUSR1 self-restart early return with drain wait (_wait_for_gateway_exit) and then continued to launchctl kickstart -k like the other restart path.
  • tests/hermes_cli/test_gateway_service.py: updated self-restart test to assert SIGUSR1 request, drain wait call with configured timeout, and launchctl kickstart -k execution.

How to Test

  1. Run targeted restart tests: pytest tests/hermes_cli/test_gateway_service.py -q -k launchd_restart
  2. Run Windows footgun guard on changed files: python scripts/check-windows-footguns.py hermes_cli/gateway.py tests/hermes_cli/test_gateway_service.py
  3. Run lint on changed files: ruff check hermes_cli/gateway.py tests/hermes_cli/test_gateway_service.py

What platforms tested on

  • Linux (worker sandbox), Python from shared project venv

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Linux (sandbox)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard labels Jun 1, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to open PR #35178 (drain-aware SIGUSR1 in launchd_restart, fixes #27745) and the broader launchd-restart cluster (#34230, #31218). This addresses the same ancestor-PID early-return path. Maintainers may want to consolidate these macOS launchd restart fixes.

@konsisumer

Copy link
Copy Markdown
Contributor Author

Thanks @alt-glitch — I looked at #35178.

Both PRs touch launchd_restart() and tests/hermes_cli/test_gateway_service.py, so they would conflict. The bugs they fix are related but distinct:

Should I:
a) Close this PR in favor of #35178 (if that PR's approach subsumes this fix), or
b) Rebase on top of #35178 once it merges, or
c) Proceed independently since the bugs are distinct?

Happy to consolidate if that's the preference — just need guidance on which approach to take.


Note: the CI failure in test_model_catalog.py (minimax model ID mismatch) is a pre-existing main-health issue unrelated to this PR's changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

macOS: launchd_restart() returns early after SIGUSR1, leaving gateway permanently dead

2 participants