test(conftest): block tests from killing the live hermes-gateway by teknium1 · Pull Request #23397 · NousResearch/hermes-agent

teknium1 · 2026-05-10T20:07:42Z

Summary

Tests can no longer SIGTERM your running gateway. The shutdown forensics from #23285 caught tests/hermes_cli/ pytest runs killing the live gateway 5+ times in 3 days; one new autouse fixture in tests/conftest.py makes the leak structurally impossible.

Root cause

Hermetic conftest already isolates HERMES_HOME to a tempdir, but find_gateway_pids runs a psutil scan across the whole machine — the gateway PID file is irrelevant, the live process gets returned, the unmocked os.kill does the rest. Same hole on the systemd path: a real subprocess.run(["systemctl", "--user", "restart", "hermes-gateway"]) bypasses HERMES_HOME entirely.

Changes

tests/conftest.py: new _live_system_guard autouse fixture (+167 LOC, single file).
- os.kill rejects any PID outside the test process subtree with a clear RuntimeError (uses psutil to walk parents).
- subprocess.run / Popen / call / check_call / check_output reject systemctl <restart|start|stop|kill|reload|reset-failed|enable|disable|mask|unmask|daemon-reload> hermes-gateway invocations. Read-only status / show / list-units still pass through.
- Tests that legitimately need real signal delivery opt out with @pytest.mark.live_system_guard_bypass.
We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids — tests of those functions themselves need the real implementation. Discovery without delivery is harmless.

Why guard rather than fix individual tests

Auditing every cmd_update / kill_gateway_processes / stop_profile_gateway test for proper mocking is a recurring tax — every new test in the area is one missed mock away from the same leak. A single conftest guard catches all of them, including future ones.

Validation

	Failures	Suite
Before (main only)	17 (all pre-existing, unrelated to gateway kill)	tests/hermes_cli/ + tests/cli/ + tests/gateway/ — 10085 passed
After (this PR)	17 (identical set)	10085 passed

Zero new failures introduced. Live gateway PID 1234988 survived a scripts/run_tests.sh tests/hermes_cli/ tests/cli/ tests/gateway/ run that previously would have killed it.

The 17 pre-existing failures (test_cli_save_config_value, test_tts_media_routing, test_update_streaming, TestFindGatewayPidsExclude, etc.) are all separate issues — No module named 'ruamel', missing GatewayRunner attributes from the recent platform-policy work, and a few stale assertions. Not in scope here.

github-actions · 2026-05-10T20:08:43Z

🔎 Lint report: `hermes/hermes-938a35d0` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8074 on HEAD, 8073 on base (🆕 +1)

🆕 New issues (1):

Rule	Count
`unresolved-import`	1

First entries

tests/conftest.py:681: [unresolved-import] unresolved-import: Cannot resolve imported module `psutil`

✅ Fixed issues: none

Unchanged: 4253 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest runs sending SIGTERM to the developer's live gateway 5+ times in 3 days. Root cause: when a single test forgets to mock os.kill or find_gateway_pids, the real call leaks past the hermetic HERMES_HOME isolation — find_gateway_pids' psutil scan walks the whole machine and returns the live gateway PID, then the unmocked os.kill delivers the signal. Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes, and stop_profile_gateway code paths, install a single autouse guard in tests/conftest.py that blocks the two primitives that actually cause the damage: - os.kill rejects any PID outside the test process subtree with a hard RuntimeError so the offending test gets a stack trace instead of silently murdering the real gateway. - subprocess.run / Popen / call / check_call / check_output reject any 'systemctl <verb> hermes-gateway' invocation that would mutate the live unit. Read-only systemctl calls (status, show, list-units) still pass through. We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids — tests of those functions themselves need the real implementation. Discovery without delivery is harmless; the os.kill + systemctl guards catch the actual damage path. Tests that legitimately need real signal delivery (e.g. PTY tests signalling their own child) opt out via @pytest.mark.live_system_guard_bypass. Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce the same 17 failures with and without this guard (all pre-existing on main, unrelated to gateway-kill leaks). The live gateway survives the test run that previously SIGTERMed it.

The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.

…sResearch#23397) The shutdown forensics added in NousResearch#23285 caught tests/hermes_cli/ pytest runs sending SIGTERM to the developer's live gateway 5+ times in 3 days. Root cause: when a single test forgets to mock os.kill or find_gateway_pids, the real call leaks past the hermetic HERMES_HOME isolation — find_gateway_pids' psutil scan walks the whole machine and returns the live gateway PID, then the unmocked os.kill delivers the signal. Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes, and stop_profile_gateway code paths, install a single autouse guard in tests/conftest.py that blocks the two primitives that actually cause the damage: - os.kill rejects any PID outside the test process subtree with a hard RuntimeError so the offending test gets a stack trace instead of silently murdering the real gateway. - subprocess.run / Popen / call / check_call / check_output reject any 'systemctl <verb> hermes-gateway' invocation that would mutate the live unit. Read-only systemctl calls (status, show, list-units) still pass through. We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids — tests of those functions themselves need the real implementation. Discovery without delivery is harmless; the os.kill + systemctl guards catch the actual damage path. Tests that legitimately need real signal delivery (e.g. PTY tests signalling their own child) opt out via @pytest.mark.live_system_guard_bypass. Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce the same 17 failures with and without this guard (all pre-existing on main, unrelated to gateway-kill leaks). The live gateway survives the test run that previously SIGTERMed it.

The existing _live_system_guard (PR NousResearch#23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.