Skip to content

fix(gateway): fix macOS gateway PID detection in find_gateway_pids#10636

Open
IISweetHeartII wants to merge 1 commit into
NousResearch:mainfrom
IISweetHeartII:fix/macos-gateway-pid-detection
Open

fix(gateway): fix macOS gateway PID detection in find_gateway_pids#10636
IISweetHeartII wants to merge 1 commit into
NousResearch:mainfrom
IISweetHeartII:fix/macos-gateway-pid-detection

Conversation

@IISweetHeartII

Copy link
Copy Markdown

Summary

find_gateway_pids() fails to detect running gateway processes on macOS, causing hermes cron status to report "Gateway is not running" even when the gateway is alive and healthy (/health returns OK).

Two root causes:

1. launchctl list <label> output format mismatch

Modern macOS (12+) returns a property-list dictionary:

"PID" = 12345;
"Label" = "ai.hermes.gateway";

The parser expects legacy tabular format (PID\tStatus\tLabel) and never extracts the PID.

Fix: Regex extraction ("PID"\s*=\s*(\d+)) for the modern dict format, with fallback to the legacy tabular parser.

2. ps flag argument splitting (#9069, #9100)

# Before — "eww" treated as process name filter on BSD ps
["ps", "-A", "eww", "-o", "pid=,command="]  # Returns ~5 results

# After — flags combined correctly
["ps", "-Aeww", "-o", "pid=,command="]      # Returns ~400+ results

When eww is a separate argument, BSD ps interprets it as a process name filter instead of flags, returning only processes matching "eww" as a keyword.

Test plan

Tested on macOS 15.4 (Sequoia) with launchd-managed gateway:

  • hermes cron status correctly shows "Gateway is running — PID: <pid>"
  • hermes cron run <id> fires and completes successfully
  • hermes cron tick executes due jobs
  • Gateway health endpoint responds OK
  • No regression on hermes update (restart detection still works)

Related issues

Two issues prevent `find_gateway_pids()` from detecting the running
gateway process on macOS, causing `hermes cron status` to falsely
report "Gateway is not running" even when the process is alive and
healthy.

**1. `launchctl list <label>` output format mismatch**

Modern macOS (12+) returns a property-list dictionary format:

    "PID" = 12345;

The existing parser expected the legacy tabular format:

    PID	Status	Label

Added regex-based extraction for the modern dict format with a
fallback to the legacy tabular parser.

Refs: NousResearch#4820

**2. `ps` flag argument splitting**

`subprocess.run(["ps", "-A", "eww", ...])` passes `eww` as a
separate argument, which BSD `ps` interprets as a process name
filter rather than flags. This returns ~5 results instead of the
full process list (~400+), causing pattern matching to miss the
gateway process.

Fixed by combining flags: `["ps", "-Aeww", ...]`

Refs: NousResearch#9069, NousResearch#9100

Tested on macOS 15.4 (Sequoia) with launchd-managed gateway.
Before: `hermes cron status` → "Gateway is not running"
After: `hermes cron status` → "Gateway is running — PID: <pid>"
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants