Skip to content

macOS: hermes gateway start fails when launchd plist is missing; should recreate/load service #3318

@intobigdata

Description

@intobigdata

Bug Description

On macOS, hermes gateway start can fail with launchctl start ai.hermes.gateway exit status 3 if the launchd plist is missing. The current start path assumes the plist already exists and only attempts a refresh if it does. If the plist has been removed for any reason, start does not recover and the gateway cannot be restarted without a manual reinstall.

This is related to the older launchd recovery work in #1613 / #1614, but that fix only covers the case where the launchd job is unloaded while the plist still exists. It does not cover the missing-plist case.

Steps to Reproduce

  1. On macOS, have Hermes gateway managed via launchd.
  2. End up in a state where the launchd plist is missing.
    • I observed this after a hermes gateway stop, though I cannot prove from this report alone that stop itself is the direct deletion path.
    • Regardless of how the plist disappears, start should recover from that state.
  3. Run:
    hermes gateway start
  4. Observe launchctl start ai.hermes.gateway fails with exit status 3.

Expected Behavior

hermes gateway start should self-heal if the plist is missing by:

  1. regenerating the plist
  2. loading it with launchctl load
  3. starting the service

Users should not need to run hermes gateway install manually just to recover from a missing plist.

Actual Behavior

launchd_start() calls refresh_launchd_plist_if_needed(), but that function returns immediately if the plist file does not exist. Then Hermes tries to run:

launchctl start ai.hermes.gateway

Because the service definition is not loaded, launchctl exits with status 3 and Hermes crashes.

Environment

  • OS: macOS
  • Hermes version: Hermes Agent v0.4.0 (2026.3.23)
  • Project root: /Users/john/.hermes/hermes-agent
  • Python: 3.11.15
  • OpenAI SDK: 2.29.0
  • Installed repo commit: b8b1f24fd755ae187a0fbaedf5c9657a2af1ef1e
  • Last commit message at inspected install:
    • b8b1f24f fix: handle addition-only hunks in V4A patch parser (#3325)
  • Hermes entrypoint used:
    • /Users/john/.local/bin/hermes
  • Gateway service code inspected:
    • /Users/john/.hermes/hermes-agent/hermes_cli/gateway.py

Error Output

(base) john@MacBook-Pro-2 whatsapp-bridge % hermes gateway stop
✓ Stopped 1 gateway process(es)

(base) john@MacBook-Pro-2 whatsapp-bridge % hermes gateway start
Traceback (most recent call last):
  File "/Users/john/.local/bin/hermes", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/john/.hermes/hermes-agent/hermes_cli/main.py", line 4181, in main
    args.func(args)
  File "/Users/john/.hermes/hermes-agent/hermes_cli/main.py", line 546, in cmd_gateway
    gateway_command(args)
  File "/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py", line 1746, in gateway_command
    launchd_start()
  File "/Users/john/.hermes/hermes-agent/hermes_cli/gateway.py", line 869, in launchd_start
    subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['launchctl', 'start', 'ai.hermes.gateway']' returned non-zero exit status 3.

Code Evidence (current installed code)

From /Users/john/.hermes/hermes-agent/hermes_cli/gateway.py:

1. Missing plist is explicitly ignored by refresh logic

def refresh_launchd_plist_if_needed() -> bool:
    plist_path = get_launchd_plist_path()
    if not plist_path.exists() or launchd_plist_is_current():
        return False

2. Start only retries when the plist already exists

def launchd_start():
    refresh_launchd_plist_if_needed()
    plist_path = get_launchd_plist_path()
    try:
        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
    except subprocess.CalledProcessError as e:
        if e.returncode != 3 or not plist_path.exists():
            raise
        print("↻ launchd job was unloaded; reloading service definition")
        subprocess.run(["launchctl", "load", str(plist_path)], check=True)
        subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)

So if the plist is missing:

  1. refresh_launchd_plist_if_needed() does nothing
  2. launchctl start fails
  3. the retry path is skipped because not plist_path.exists() is true
  4. Hermes raises instead of self-healing

Existing related issue / distinction from prior fix

There is an older related issue:

That fix addressed the case where:

  • the launchd plist exists
  • but the job is unloaded

It does not fix the case reported here:

  • the plist file itself is missing

So this issue should be treated as either:

Test Evidence

There is already a test that appears to encode the current non-recovery behavior for missing plist:

  • tests/hermes_cli/test_update_gateway_restart.py
  • test_refresh_skips_when_no_plist

This reinforces that the missing-plist case is still currently unhandled.

Why this matters

Even if hermes gateway stop is not intended to delete the plist, users can still end up in a missing-plist state through:

  • launchd state drift
  • manual cleanup
  • older versions / failed upgrades
  • migration or reinstall edge cases
  • filesystem cleanup or local service repair attempts

hermes gateway start should be robust to this and repair the missing service definition automatically.

Suggested Fix

In launchd_start():

  • if the plist is missing:
    • regenerate it
    • write it to the expected launchd path
    • launchctl load <plist>
  • then run launchctl start ai.hermes.gateway

In other words, missing plist should be treated as recoverable state.

Suggested Regression Test

Add a launchd test covering the missing-plist case:

  • monkeypatch get_launchd_plist_path() to a nonexistent temp plist
  • call launchd_start()
  • assert that it:
    1. creates the plist
    2. loads it with launchctl load
    3. starts it successfully

A good test name would be something like:

  • test_launchd_start_recreates_missing_plist_and_loads_service

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions