Bug Description
On macOS, hermes gateway start can fail with launchctl start ai.hermes.gateway exit status 3 if the launchd plist is missing. The current start path assumes the plist already exists and only attempts a refresh if it does. If the plist has been removed for any reason, start does not recover and the gateway cannot be restarted without a manual reinstall.
This is related to the older launchd recovery work in #1613 / #1614, but that fix only covers the case where the launchd job is unloaded while the plist still exists. It does not cover the missing-plist case.
Steps to Reproduce
- On macOS, have Hermes gateway managed via launchd.
- End up in a state where the launchd plist is missing.
- I observed this after a
hermes gateway stop, though I cannot prove from this report alone that stop itself is the direct deletion path.
- Regardless of how the plist disappears,
start should recover from that state.
- Run:
- Observe
launchctl start ai.hermes.gateway fails with exit status 3.
Expected Behavior
hermes gateway start should self-heal if the plist is missing by:
- regenerating the plist
- loading it with
launchctl load
- starting the service
Users should not need to run hermes gateway install manually just to recover from a missing plist.
Actual Behavior
launchd_start() calls refresh_launchd_plist_if_needed(), but that function returns immediately if the plist file does not exist. Then Hermes tries to run:
launchctl start ai.hermes.gateway
Because the service definition is not loaded, launchctl exits with status 3 and Hermes crashes.
Environment
- OS: macOS
- Hermes version:
Hermes Agent v0.4.0 (2026.3.23)
- Project root:
/Users/john/.hermes/hermes-agent
- Python:
3.11.15
- OpenAI SDK:
2.29.0
- Installed repo commit:
b8b1f24fd755ae187a0fbaedf5c9657a2af1ef1e
- Last commit message at inspected install:
b8b1f24f fix: handle addition-only hunks in V4A patch parser (#3325)
- Hermes entrypoint used:
/Users/john/.local/bin/hermes
- Gateway service code inspected:
/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py
Error Output
(base) john@MacBook-Pro-2 whatsapp-bridge % hermes gateway stop
✓ Stopped 1 gateway process(es)
(base) john@MacBook-Pro-2 whatsapp-bridge % hermes gateway start
Traceback (most recent call last):
File "/Users/john/.local/bin/hermes", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/john/.hermes/hermes-agent/hermes_cli/main.py", line 4181, in main
args.func(args)
File "/Users/john/.hermes/hermes-agent/hermes_cli/main.py", line 546, in cmd_gateway
gateway_command(args)
File "/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py", line 1746, in gateway_command
launchd_start()
File "/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py", line 869, in launchd_start
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['launchctl', 'start', 'ai.hermes.gateway']' returned non-zero exit status 3.
Code Evidence (current installed code)
From /Users/john/.hermes/hermes-agent/hermes_cli/gateway.py:
1. Missing plist is explicitly ignored by refresh logic
def refresh_launchd_plist_if_needed() -> bool:
plist_path = get_launchd_plist_path()
if not plist_path.exists() or launchd_plist_is_current():
return False
2. Start only retries when the plist already exists
def launchd_start():
refresh_launchd_plist_if_needed()
plist_path = get_launchd_plist_path()
try:
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
except subprocess.CalledProcessError as e:
if e.returncode != 3 or not plist_path.exists():
raise
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "load", str(plist_path)], check=True)
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
So if the plist is missing:
refresh_launchd_plist_if_needed() does nothing
launchctl start fails
- the retry path is skipped because
not plist_path.exists() is true
- Hermes raises instead of self-healing
Existing related issue / distinction from prior fix
There is an older related issue:
That fix addressed the case where:
- the launchd plist exists
- but the job is unloaded
It does not fix the case reported here:
- the plist file itself is missing
So this issue should be treated as either:
Test Evidence
There is already a test that appears to encode the current non-recovery behavior for missing plist:
tests/hermes_cli/test_update_gateway_restart.py
test_refresh_skips_when_no_plist
This reinforces that the missing-plist case is still currently unhandled.
Why this matters
Even if hermes gateway stop is not intended to delete the plist, users can still end up in a missing-plist state through:
- launchd state drift
- manual cleanup
- older versions / failed upgrades
- migration or reinstall edge cases
- filesystem cleanup or local service repair attempts
hermes gateway start should be robust to this and repair the missing service definition automatically.
Suggested Fix
In launchd_start():
- if the plist is missing:
- regenerate it
- write it to the expected launchd path
launchctl load <plist>
- then run
launchctl start ai.hermes.gateway
In other words, missing plist should be treated as recoverable state.
Suggested Regression Test
Add a launchd test covering the missing-plist case:
- monkeypatch
get_launchd_plist_path() to a nonexistent temp plist
- call
launchd_start()
- assert that it:
- creates the plist
- loads it with
launchctl load
- starts it successfully
A good test name would be something like:
test_launchd_start_recreates_missing_plist_and_loads_service
Bug Description
On macOS,
hermes gateway startcan fail withlaunchctl start ai.hermes.gatewayexit status 3 if the launchd plist is missing. The current start path assumes the plist already exists and only attempts a refresh if it does. If the plist has been removed for any reason, start does not recover and the gateway cannot be restarted without a manual reinstall.This is related to the older launchd recovery work in #1613 / #1614, but that fix only covers the case where the launchd job is unloaded while the plist still exists. It does not cover the missing-plist case.
Steps to Reproduce
hermes gateway stop, though I cannot prove from this report alone thatstopitself is the direct deletion path.startshould recover from that state.launchctl start ai.hermes.gatewayfails with exit status 3.Expected Behavior
hermes gateway startshould self-heal if the plist is missing by:launchctl loadUsers should not need to run
hermes gateway installmanually just to recover from a missing plist.Actual Behavior
launchd_start()callsrefresh_launchd_plist_if_needed(), but that function returns immediately if the plist file does not exist. Then Hermes tries to run:Because the service definition is not loaded, launchctl exits with status 3 and Hermes crashes.
Environment
Hermes Agent v0.4.0 (2026.3.23)/Users/john/.hermes/hermes-agent3.11.152.29.0b8b1f24fd755ae187a0fbaedf5c9657a2af1ef1eb8b1f24f fix: handle addition-only hunks in V4A patch parser (#3325)/Users/john/.local/bin/hermes/Users/john/.hermes/hermes-agent/hermes_cli/gateway.pyError Output
Code Evidence (current installed code)
From
/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py:1. Missing plist is explicitly ignored by refresh logic
2. Start only retries when the plist already exists
So if the plist is missing:
refresh_launchd_plist_if_needed()does nothinglaunchctl startfailsnot plist_path.exists()is trueExisting related issue / distinction from prior fix
There is an older related issue:
That fix addressed the case where:
It does not fix the case reported here:
So this issue should be treated as either:
or
Test Evidence
There is already a test that appears to encode the current non-recovery behavior for missing plist:
tests/hermes_cli/test_update_gateway_restart.pytest_refresh_skips_when_no_plistThis reinforces that the missing-plist case is still currently unhandled.
Why this matters
Even if
hermes gateway stopis not intended to delete the plist, users can still end up in a missing-plist state through:hermes gateway startshould be robust to this and repair the missing service definition automatically.Suggested Fix
In
launchd_start():launchctl load <plist>launchctl start ai.hermes.gatewayIn other words, missing plist should be treated as recoverable state.
Suggested Regression Test
Add a launchd test covering the missing-plist case:
get_launchd_plist_path()to a nonexistent temp plistlaunchd_start()launchctl loadA good test name would be something like:
test_launchd_start_recreates_missing_plist_and_loads_service