Skip to content

Multi-profile gateway is not safe: kill_gateway_processes() and release_all_scoped_locks() are profile-blind #4587

@Git-on-my-level

Description

@Git-on-my-level

Summary

When running multiple Hermes profiles concurrently on the same machine (each with its own HERMES_HOME and launchd service), several operations are not profile-safe and silently kill other profiles' gateways.

Reproduction

  1. Install two profiles (e.g. default + hermes-m4-sf-sales), each with their own launchd plist and Telegram bot token.
  2. Both running fine via launchctl list.
  3. Run hermes gateway stop, hermes gateway restart, or hermes update under any profile.
  4. Observe that all gateway processes across all profiles receive SIGTERM.

Bug 1: find_gateway_pids() finds ALL gateway processes

File: hermes_cli/gateway.pyfind_gateway_pids()

The function does ps aux and matches any process containing "hermes_cli.main gateway" in the command line. It does not filter by HERMES_HOME. This means:

  • hermes gateway stop kills every gateway on the machine
  • hermes gateway restart kills every gateway, then only restarts its own
  • hermes update kills every gateway, then only restarts its own
  • Any agent-initiated hermes gateway stop (e.g. from a running session) kills all profiles

Fix: Filter PIDs by their HERMES_HOME environment variable. On macOS, this can be done with ps -E or by reading /proc/PID/environ on Linux. Only include PIDs whose HERMES_HOME matches the current profile.

Bug 2: release_all_scoped_locks() nukes all lock files

File: gateway/status.pyrelease_all_scoped_locks()

Called during --replace to clean up stale locks. It deletes every .lock file in ~/.local/state/hermes/gateway-locks/, including locks actively held by other profiles. This can cause a second profile to lose its Telegram token lock, leading to duplicate polling or fatal errors.

Fix: Only release locks owned by the calling process' PID, or scoped to the calling profile's HERMES_HOME.

Bug 3 (latent): KeepAlive SuccessfulExit=false + clean SIGTERM = permanent death

File: hermes_cli/gateway.pygenerate_launchd_plist()

The generated plist uses:

<key>KeepAlive</key>
<dict>
    <key>SuccessfulExit</key>
    <false/>
</dict>

When the gateway receives SIGTERM (from bug 1 above, or any source), it shuts down cleanly and exits with code 0. launchd treats exit(0) as "successful" and does not restart the service. The profile stays dead until manually kickstarted.

This is latent (not a problem on its own) but becomes catastrophic when combined with bug 1 — any profile's gateway stop/update/restart permanently kills all other profiles.

Fix: Use <key>KeepAlive</key><true/> instead. A gateway daemon should always be restarted regardless of exit code.

Impact

Any multi-profile setup is fragile. A single hermes update on the default profile will silently and permanently kill all other profile gateways. The only recovery is manual launchctl kickstart.

Environment

  • macOS Sequoia, launchd service management
  • Multiple profiles under ~/.hermes/profiles/
  • Hermes version: latest (installed via hermes update)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/cliCLI entry point, hermes_cli/, setup wizardcomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions