Skip to content

fix(gateway): write planned-stop marker from systemd ExecStop#42555

Open
izumi0uu wants to merge 1 commit into
NousResearch:mainfrom
izumi0uu:fix/gateway-execstop-planned-stop-42517
Open

fix(gateway): write planned-stop marker from systemd ExecStop#42555
izumi0uu wants to merge 1 commit into
NousResearch:mainfrom
izumi0uu:fix/gateway-execstop-planned-stop-42517

Conversation

@izumi0uu

@izumi0uu izumi0uu commented Jun 9, 2026

Copy link
Copy Markdown

What does this PR do?

Routes systemctl stop through the existing planned-stop marker path instead of trying to infer who sent SIGTERM inside the gateway.

The generated systemd unit now includes an ExecStop helper that writes the planned-stop marker for $MAINPID before systemd delivers SIGTERM. That keeps intentional service stops on the same clean-shutdown path already used by hermes gateway stop, while leaving unexpected external SIGTERM handling unchanged.

This follows the direction in #42517 and avoids the INVOCATION_ID / under_systemd inference problem discussed against #41642 and #41690.

Related Issue

Fixes #42517

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • Added a hidden hermes gateway planned-stop-helper --pid <pid> subcommand for service-managed stop flows
  • Updated generated systemd user/system units to include:
    • ExecStop=<python> -m hermes_cli.main gateway planned-stop-helper --pid $MAINPID
  • Kept ExecStop non-recursive:
    • it writes the planned-stop marker only
    • it does not call hermes gateway stop
  • Added regression coverage for:
    • ExecStop presence in generated units
    • helper wiring using $MAINPID
    • failure path when marker writing fails
    • no root/tmp path leakage in remapped system units

How to Test

  1. Generate/install the updated systemd gateway unit.
  2. Verify the unit contains ExecStop=... gateway planned-stop-helper --pid $MAINPID.
  3. Start the gateway service and stop it with systemctl --user stop <service> (or system scope as appropriate).
  4. Confirm the gateway exits through the planned-stop path rather than relying on INVOCATION_ID-based signal-source inference.
  5. Run the targeted regression tests below.

Targeted local validation:

pytest tests/hermes_cli/test_gateway_service.py -q -k 'planned_stop_helper or avoids_recursive_execstop or system_unit_has_no_root_paths or systemd_stop_marks_running_gateway_as_planned_stop'
pytest tests/hermes_cli/test_systemd_optional_directives.py -q
pytest tests/hermes_cli/test_gateway_service.py -q -k 'systemd_unit_uses_dot_venv_when_detected or systemd_unit_includes_profile or system_unit_includes_wsl_windows_interop_paths or user_unit_includes_resolved_node_directory_in_path'

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS (targeted unit-generation and regression tests)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

N/A

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard labels Jun 9, 2026

@tonydwb tonydwb left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved

Overview

This PR adds a planned-stop helper that writes a marker file when systemd ExecStop fires, enabling clean gateway shutdown.

Changes

  • Adds gateway_planned_stop_helper command
  • Adds ExecStop to systemd unit files
  • Good test coverage

Quality

  • Clean fix with proper subprocess handling using shlex.quote
  • No security concerns

Reviewed by Hermes Agent

@benbarclay

Copy link
Copy Markdown
Collaborator

Heads-up for context (not asking to close this): #43236 just merged, fixing the Docker/s6 side of the autostart problem (#42675) by reusing the planned-stop marker — including writing the marker from S6ServiceManager.stop() before s6-svc -d.

This PR targets the systemd ExecStop path (#41631), which #43236 didn't touch, so it's complementary rather than superseded. One thing worth reconciling now that #43236 landed: it added GatewayRunner._signal_initiated_shutdown and the persist gate in _stop_impl (signal-initiated → persist running, planned/marked → persist stopped). Your systemd ExecStop marker write should slot into that same classification cleanly — once you rebase on current main, the marker you write will land you in the planned-stop branch automatically. Worth a rebase + re-verify against the new gate.

Make generated systemd units write the planned-stop marker from ExecStop
before systemd delivers SIGTERM to the gateway process. This keeps
`systemctl stop` on the existing planned-stop path instead of teaching the
signal handler to guess who sent SIGTERM.

The helper is a hidden `hermes gateway planned-stop-helper --pid $MAINPID`
subcommand so the unit can mark the target process without recursively
calling `hermes gateway stop` from ExecStop.

Constraint: systemd stop and unexpected external SIGTERM are indistinguishable inside the signal handler
Rejected: infer SIGTERM source from INVOCATION_ID or under_systemd flags | those only prove the runtime, not the sender
Confidence: high
Scope-risk: narrow
Directive: Keep planned-stop classification on the service stop path rather than adding more SIGTERM-source heuristics
Tested: pytest tests/hermes_cli/test_gateway_service.py -q -k 'planned_stop_helper or avoids_recursive_execstop or system_unit_has_no_root_paths or systemd_stop_marks_running_gateway_as_planned_stop'
Tested: pytest tests/hermes_cli/test_systemd_optional_directives.py -q
Tested: pytest tests/hermes_cli/test_gateway_service.py -q -k 'systemd_unit_uses_dot_venv_when_detected or systemd_unit_includes_profile or system_unit_includes_wsl_windows_interop_paths or user_unit_includes_resolved_node_directory_in_path'
Not-tested: live systemctl stop against an installed Linux systemd unit
@izumi0uu izumi0uu force-pushed the fix/gateway-execstop-planned-stop-42517 branch from 27ad135 to b4c6a37 Compare June 10, 2026 06:30
@izumi0uu

Copy link
Copy Markdown
Author

Rebased onto the latest main and re-verified the relevant gateway and systemd tests. No changes to the PR diff, just replayed on top of the latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(gateway): ExecStop should write planned-stop marker instead of inferring signal source

4 participants