Summary
On Windows, hermes gateway can crash on startup after an abrupt shutdown/power-off because a stale gateway.pid is left behind and gateway/status.py:get_running_pid() uses os.kill(pid, 0) as the liveness probe.
When the PID is stale, Windows may raise a generic OSError instead of ProcessLookupError. In my case it raised WinError 11; probing the same stale PID from Git Bash also produced WinError 87 (The parameter is incorrect). Because get_running_pid() only treats ProcessLookupError / PermissionError as stale, the startup path crashes before it can remove the old PID file.
Deleting the stale gateway.pid / gateway_state.json immediately fixes startup.
Environment
- Hermes Agent:
v0.8.0 (2026.4.8)
- Python:
3.12.11
- OS:
Windows 11 10.0.22621
Repro
- Start gateway normally.
- Hard power-off / abrupt shutdown so Hermes cannot clean up
gateway.pid.
- Boot again and run:
- Startup crashes while checking the old PID.
Observed traceback
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\venv\Scripts\hermes.exe\__main__.py", line 10, in <module>
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\hermes_cli\main.py", line 5671, in main
args.func(args)
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\hermes_cli\main.py", line 670, in cmd_gateway
gateway_command(args)
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\hermes_cli\gateway.py", line 2302, in gateway_command
run_gateway(verbose, quiet=quiet, replace=replace)
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\hermes_cli\gateway.py", line 1341, in run_gateway
success = asyncio.run(start_gateway(replace=replace, verbosity=verbosity))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\12737\AppData\Roaming\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\asyncio\runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\12737\AppData\Roaming\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\12737\AppData\Roaming\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\asyncio\base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\gateway\run.py", line 7717, in start_gateway
existing_pid = get_running_pid()
^^^^^^^^^^^^^^^^^
File "C:\Users\12737\AppData\Local\hermes\hermes-agent\gateway\status.py", line 400, in get_running_pid
os.kill(pid, 0) # signal 0 = existence check, no actual signal sent
^^^^^^^^^^^^^^^
OSError: [WinError 11] An attempt was made to load a program with an incorrect format
Why this seems wrong
Current code in gateway/status.py:get_running_pid():
- reads
gateway.pid
- calls
os.kill(pid, 0)
- only treats
ProcessLookupError / PermissionError as “not running”
On Windows, stale/non-probeable PIDs can also raise plain OSError (WinError 11, WinError 87, possibly others). That should not hard-crash gateway startup.
Workaround
Delete:
%LOCALAPPDATA%\hermes\gateway.pid
%LOCALAPPDATA%\hermes\gateway_state.json
After deleting those files, hermes gateway starts normally again.
Suggested fix
Make PID liveness checks Windows-safe. At minimum, get_running_pid() should treat Windows OSError from os.kill(pid, 0) as stale and remove the PID file rather than crashing.
It may also be worth auditing the other os.kill(pid, 0) probes in gateway/profile code for the same assumption.
Summary
On Windows,
hermes gatewaycan crash on startup after an abrupt shutdown/power-off because a stalegateway.pidis left behind andgateway/status.py:get_running_pid()usesos.kill(pid, 0)as the liveness probe.When the PID is stale, Windows may raise a generic
OSErrorinstead ofProcessLookupError. In my case it raisedWinError 11; probing the same stale PID from Git Bash also producedWinError 87(The parameter is incorrect). Becauseget_running_pid()only treatsProcessLookupError/PermissionErroras stale, the startup path crashes before it can remove the old PID file.Deleting the stale
gateway.pid/gateway_state.jsonimmediately fixes startup.Environment
v0.8.0 (2026.4.8)3.12.11Windows 11 10.0.22621Repro
gateway.pid.Observed traceback
Why this seems wrong
Current code in
gateway/status.py:get_running_pid():gateway.pidos.kill(pid, 0)ProcessLookupError/PermissionErroras “not running”On Windows, stale/non-probeable PIDs can also raise plain
OSError(WinError 11,WinError 87, possibly others). That should not hard-crash gateway startup.Workaround
Delete:
%LOCALAPPDATA%\hermes\gateway.pid%LOCALAPPDATA%\hermes\gateway_state.jsonAfter deleting those files,
hermes gatewaystarts normally again.Suggested fix
Make PID liveness checks Windows-safe. At minimum,
get_running_pid()should treat WindowsOSErrorfromos.kill(pid, 0)as stale and remove the PID file rather than crashing.It may also be worth auditing the other
os.kill(pid, 0)probes in gateway/profile code for the same assumption.