Skip to content

fix(gateway): adopt unit's HERMES_HOME for --system CLI ops (salvage #22533)#22803

Merged
teknium1 merged 2 commits into
mainfrom
salvage/pr-22533
May 9, 2026
Merged

fix(gateway): adopt unit's HERMES_HOME for --system CLI ops (salvage #22533)#22803
teknium1 merged 2 commits into
mainfrom
salvage/pr-22533

Conversation

@teknium1

@teknium1 teknium1 commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Salvage of #22533hermes gateway restart --system (and status / stop) now read the gateway's actual runtime state under sudo instead of looking in /root/.hermes/.

Root cause

Under sudo, HERMES_HOME is stripped from the env and HOME=/root, so get_hermes_home() returns /root/.hermes. read_runtime_status() and get_running_pid() derive their paths from that — they look for gateway_state.json in /root/.hermes/ while the actually-running gateway wrote it under the unit's pinned HERMES_HOME=/home/<user>/.hermes/. _wait_for_systemd_service_restart polls read_runtime_status() for 60s, never sees running, times out, then forces another systemctl restart that SIGTERMs the in-progress new gateway.

Changes (contributor commit, re-authored to mbac)

  • hermes_cli/gateway.py: read the unit's pinned Environment= via systemctl show -p Environment, parse HERMES_HOME=..., and mirror it into os.environ before any HERMES_HOME-derived read in the three --system entrypoints (systemd_restart, systemd_status, systemd_stop). Early-out when system=False (user-scope inherits naturally). Errors swallowed so a transient systemctl failure doesn't break unrelated CLI ops.

Note: original PR was opened by mbac but committed locally as Test User <test@example.com> (default git config). Cherry-picked + re-authored to mbac <308068+mbac@users.noreply.github.com> to preserve attribution.

Validation

  • bash syntax clean. The change is at CLI entry, before HERMES_HOME-derived reads, and is a no-op for user-scope.

Closes #22035 via salvage.

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: salvage/pr-22533 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7927 on HEAD, 7927 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4189 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

mbac and others added 2 commits May 9, 2026 13:38
When systemd_restart / systemd_status / systemd_stop run under sudo,
HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves
to /root/.hermes instead of the unit's pinned home. read_runtime_status
and get_running_pid then look at the wrong gateway_state.json — the
60s status poll never sees "running", times out, and forces another
systemctl restart that SIGTERMs the in-progress new gateway.

Read the unit's pinned HERMES_HOME from `systemctl show -p Environment`
and mirror it into os.environ before any HERMES_HOME-derived read.
Early-out when system=False (user-scope inherits naturally). Errors
swallowed so a transient systemctl failure doesn't break unrelated
CLI ops.

Closes #22035.
@teknium1 teknium1 force-pushed the salvage/pr-22533 branch from 7a27735 to f1ab3db Compare May 9, 2026 20:38
@teknium1 teknium1 merged commit 5e2eba8 into main May 9, 2026
13 of 15 checks passed
@teknium1 teknium1 deleted the salvage/pr-22533 branch May 9, 2026 20:38
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard labels May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: gateway restart --system always reports failure (60s timeout × 2) — wrapper reads runtime status from root's HERMES_HOME

3 participants