Skip to content

fix: add HERMES_SKIP_GATEWAY_RESTART env var to opt out of auto-restart#6703

Open
r266-tech wants to merge 1 commit into
NousResearch:mainfrom
r266-tech:fix/skip-gateway-restart-env
Open

fix: add HERMES_SKIP_GATEWAY_RESTART env var to opt out of auto-restart#6703
r266-tech wants to merge 1 commit into
NousResearch:mainfrom
r266-tech:fix/skip-gateway-restart-env

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

Problem

hermes update unconditionally restarts all gateway processes after pulling new code (main.py:3758-3860). There is no env var, CLI flag, or check to opt out.

When hermes update is invoked from a cron job scheduled by hermes's own in-process cron scheduler (cron/scheduler.py tick loop + ThreadPoolExecutor worker thread runs inside the gateway Python process), the auto-restart SIGTERMs the gateway → the worker thread gets killed mid-flight → jobs.json post-run write never happens, state.db session row is missing, cron output is incomplete.

Observed: last_run_at stuck at yesterday in jobs.json even though git reflog shows the pull succeeded 19 seconds before the gateway was killed.

Fix

Adds a HERMES_SKIP_GATEWAY_RESTART env var check before the auto-restart block. When set, the restart is skipped and the user sees a message confirming the skip.

This lets cron jobs safely run hermes update by setting HERMES_SKIP_GATEWAY_RESTART=1 in the cron environment.

Changes

  • hermes_cli/main.py: Add _do_restart = not os.environ.get("HERMES_SKIP_GATEWAY_RESTART") guard wrapping the existing try/except restart block (+6 lines net)
  • tests/hermes_cli/test_update_gateway_restart.py: Add TestSkipGatewayRestartEnvVar test class verifying the env var guard logic

Test plan

  • Existing tests for auto-restart unaffected (guard defaults to True when env var absent)
  • New test verifies env var disables restart flag
  • New test verifies absent env var allows restart

Closes #6702

When `hermes update` runs from a cron job hosted inside the gateway
process, the auto-restart kills the worker thread mid-flight, causing
lost cron state (stale jobs.json, missing state.db sessions).

Adds an env var check before the restart block so cron jobs can set
HERMES_SKIP_GATEWAY_RESTART=1 to safely update without self-kill.

Closes NousResearch#6702
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management comp/cli CLI entry point, hermes_cli/, setup wizard labels Apr 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing PR with #6740 for the same issue #6702. Both add HERMES_SKIP_GATEWAY_RESTART env var.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: hermes update auto-restart kills in-process cron worker with no opt-out

2 participants