Summary
There are still a few residual service-management bugs in hermes gateway after #1567.
They all show up when a local service definition exists but the service manager state is stale or broken:
install only checks whether the plist/unit file exists, so a stale service definition is skipped instead of repaired.
launchd start does not recover from launchctl start ai.hermes.gateway returning exit status 3 when the job is unloaded.
gateway restart swallows launchd/systemd restart failures and falls back to foreground run_gateway(), which can make the service look recovered when the background service is still broken.
launchd status only reports whether the job is loaded; it does not surface that a local plist exists and is stale/out-of-sync with the current install.
Environment
- OS: macOS 15.6 / Darwin 24.6.0 x86_64
- Python: 3.11.14
- Hermes:
Hermes Agent v0.2.0 (2026.3.12)
- Repo state:
main at cfa87e77
Reproduction
1. Stale service definition is skipped by install
- Install the gateway service.
- Move or rename the repo, or otherwise make the generated
WorkingDirectory / ProgramArguments differ from the installed plist.
- Run
hermes gateway install again without --force.
Observed:
- The command exits with "Service already installed" and leaves the stale plist/unit untouched.
Expected:
- If the local service definition exists but no longer matches the current install,
install should repair it automatically.
2. launchd start cannot self-heal an unloaded job
- Have
~/Library/LaunchAgents/ai.hermes.gateway.plist present.
- Ensure the launchd job is not loaded (for example,
launchctl unload ~/Library/LaunchAgents/ai.hermes.gateway.plist).
- Run
hermes gateway start.
Observed:
launchctl start ai.hermes.gateway returns exit status 3 and the CLI does not retry with launchctl load.
Expected:
- If the plist exists locally and
start fails because the job is unloaded, Hermes should load the plist and retry once.
3. gateway restart masks broken service state
- Have a gateway plist/unit file present.
- Put the service manager into a broken/unloaded state so service restart fails.
- Run
hermes gateway restart.
Observed:
- The restart failure is swallowed.
- Hermes falls back to
run_gateway(verbose=False) in the foreground.
- This makes the gateway appear recovered, while the managed background service is still broken.
Expected:
- If a service definition exists but service restart fails, the command should report that failure clearly and exit non-zero instead of silently switching to a foreground process.
4. launchd status lacks local/stale plist diagnostics
- Leave a stale or outdated
~/Library/LaunchAgents/ai.hermes.gateway.plist on disk.
- Ensure the job is not loaded.
- Run
hermes gateway status.
Observed:
- Output only says the service is not loaded.
- It does not mention the local plist path, whether the plist is stale, or that a repair/start command would reload it.
Expected:
status should show the local plist path and whether it matches the current generated service definition, so the user can distinguish "not installed" from "installed but stale/unloaded".
Why this matters
These are all bug-fix / robustness issues in the service-management path:
- They affect macOS specifically, which is one of Hermes' supported platforms.
- They make service recovery brittle after repo moves or failed loads.
- They hide true background service failures behind a foreground fallback.
- They make diagnosis harder than it needs to be.
I have a patch ready that adds targeted recovery + tests for these cases and will open a PR linked to this issue.
Summary
There are still a few residual service-management bugs in
hermes gatewayafter #1567.They all show up when a local service definition exists but the service manager state is stale or broken:
installonly checks whether the plist/unit file exists, so a stale service definition is skipped instead of repaired.launchd startdoes not recover fromlaunchctl start ai.hermes.gatewayreturning exit status3when the job is unloaded.gateway restartswallows launchd/systemd restart failures and falls back to foregroundrun_gateway(), which can make the service look recovered when the background service is still broken.launchd statusonly reports whether the job is loaded; it does not surface that a local plist exists and is stale/out-of-sync with the current install.Environment
Hermes Agent v0.2.0 (2026.3.12)mainatcfa87e77Reproduction
1. Stale service definition is skipped by install
WorkingDirectory/ProgramArgumentsdiffer from the installed plist.hermes gateway installagain without--force.Observed:
Expected:
installshould repair it automatically.2.
launchd startcannot self-heal an unloaded job~/Library/LaunchAgents/ai.hermes.gateway.plistpresent.launchctl unload ~/Library/LaunchAgents/ai.hermes.gateway.plist).hermes gateway start.Observed:
launchctl start ai.hermes.gatewayreturns exit status3and the CLI does not retry withlaunchctl load.Expected:
startfails because the job is unloaded, Hermes should load the plist and retry once.3.
gateway restartmasks broken service statehermes gateway restart.Observed:
run_gateway(verbose=False)in the foreground.Expected:
4.
launchd statuslacks local/stale plist diagnostics~/Library/LaunchAgents/ai.hermes.gateway.pliston disk.hermes gateway status.Observed:
Expected:
statusshould show the local plist path and whether it matches the current generated service definition, so the user can distinguish "not installed" from "installed but stale/unloaded".Why this matters
These are all bug-fix / robustness issues in the service-management path:
I have a patch ready that adds targeted recovery + tests for these cases and will open a PR linked to this issue.