Background
The Hermes gateway on Windows currently has no robust auto-restart on crash. The login Startup .cmd covers reboot/login persistence; a CREATE_BREAKAWAY_FROM_JOB fix (PR #TBD) covers the specific case of "GUI update SIGTERMs the gateway and the watcher survives Electron's job teardown to respawn it"; but if the gateway dies mid-session for any other reason (OOM, taskkill, native crash), the user must run hermes gateway start manually.
On Linux this is solved by systemd Restart=always. On macOS by launchd KeepAlive. On Windows the answer is a real Windows Service registered with the Service Control Manager and SCM-driven RecoveryAction for auto-restart. Tailscale does exactly this — see cmd/tailscaled/install_windows.go:
ra := []mgr.RecoveryAction{
{mgr.ServiceRestart, 1 * time.Second},
{mgr.ServiceRestart, 2 * time.Second},
{mgr.ServiceRestart, 4 * time.Second},
{mgr.ServiceRestart, 9 * time.Second},
{mgr.ServiceRestart, 16 * time.Second},
{mgr.ServiceRestart, 25 * time.Second},
{mgr.ServiceRestart, 36 * time.Second},
{mgr.ServiceRestart, 49 * time.Second},
{mgr.ServiceRestart, 64 * time.Second},
}
service.SetRecoveryActions(ra, /*resetPeriodSecs=*/60)
Services run in session 0, so there is no possible console window — sidesteps the Task Scheduler conhost-flash issue entirely.
Why not Task Scheduler
We tried a per-minute schtasks supervisor task in #PR. It works functionally but flashes a console window that steals focus on every firing, even with all the documented mitigations (GUI-subsystem pythonw direct invoke, XML <Hidden>true</Hidden>, InteractiveToken, LeastPrivilege). Looking at prior art:
- openclaw uses a VBS
Run(..., 0, False) wrapper. Suppresses the window but Super User Q971162 confirms focus-steal still occurs in some cases.
- Ollama doesn't use Task Scheduler at all — GUI tray exe with a Startup shortcut and an internal monitor+worker pair.
- Tailscale uses a real Windows Service (this issue).
- Syncthing uses
--no-console flag + Startup folder.
Task Scheduler is the wrong tool for "Restart=always" on Windows.
What this issue tracks
Refactor hermes gateway install on Windows to register the gateway as a real Windows Service using pywin32's win32serviceutil.ServiceFramework, with SCM RecoveryActions for auto-restart. Concretely:
- Add a
WindowsGatewayService(win32serviceutil.ServiceFramework) subclass that wraps the existing gateway run entrypoint.
hermes gateway install on Windows calls win32serviceutil.InstallService + sets recovery actions (quadratic backoff like Tailscale).
hermes gateway start|stop|restart|status route to sc.exe / SCM APIs instead of (or in addition to) schtasks for service-mode installs.
- Preserve the Startup
.cmd fallback for users without admin rights to install a service.
- Decide whether to run in session 0 (no GUI access, simplest) or session-1 user context (requires SCM session brokering, more complex but lets the gateway show notifications etc.). Tailscale runs in session 0; we may need session 1 because the gateway tools can include GUI helpers.
Acceptance criteria
hermes gateway install on Windows (with admin) registers a service named HermesGateway (per profile: HermesGateway-<profile>).
- Service has RecoveryActions configured for quadratic backoff over 60s.
hermes update no longer needs CREATE_BREAKAWAY_FROM_JOB for the gateway respawn watcher because the SCM does the restart — the watcher logic can be removed on Windows.
- Killing the gateway PID externally (e.g.
taskkill /F /PID <pid>) results in SCM respawning it within ~2 seconds.
- No console window ever appears on the user's desktop for any of this.
- Non-admin users get a graceful fallback to the existing Startup
.cmd (no service install).
References
Background
The Hermes gateway on Windows currently has no robust auto-restart on crash. The login Startup
.cmdcovers reboot/login persistence; aCREATE_BREAKAWAY_FROM_JOBfix (PR #TBD) covers the specific case of "GUI update SIGTERMs the gateway and the watcher survives Electron's job teardown to respawn it"; but if the gateway dies mid-session for any other reason (OOM, taskkill, native crash), the user must runhermes gateway startmanually.On Linux this is solved by
systemd Restart=always. On macOS bylaunchd KeepAlive. On Windows the answer is a real Windows Service registered with the Service Control Manager and SCM-drivenRecoveryActionfor auto-restart. Tailscale does exactly this — seecmd/tailscaled/install_windows.go:Services run in session 0, so there is no possible console window — sidesteps the Task Scheduler conhost-flash issue entirely.
Why not Task Scheduler
We tried a per-minute
schtaskssupervisor task in #PR. It works functionally but flashes a console window that steals focus on every firing, even with all the documented mitigations (GUI-subsystem pythonw direct invoke, XML<Hidden>true</Hidden>,InteractiveToken,LeastPrivilege). Looking at prior art:Run(..., 0, False)wrapper. Suppresses the window but Super User Q971162 confirms focus-steal still occurs in some cases.--no-consoleflag + Startup folder.Task Scheduler is the wrong tool for "Restart=always" on Windows.
What this issue tracks
Refactor
hermes gateway installon Windows to register the gateway as a real Windows Service using pywin32'swin32serviceutil.ServiceFramework, with SCM RecoveryActions for auto-restart. Concretely:WindowsGatewayService(win32serviceutil.ServiceFramework)subclass that wraps the existinggateway runentrypoint.hermes gateway installon Windows callswin32serviceutil.InstallService+ sets recovery actions (quadratic backoff like Tailscale).hermes gateway start|stop|restart|statusroute tosc.exe/ SCM APIs instead of (or in addition to) schtasks for service-mode installs..cmdfallback for users without admin rights to install a service.Acceptance criteria
hermes gateway installon Windows (with admin) registers a service namedHermesGateway(per profile:HermesGateway-<profile>).hermes updateno longer needsCREATE_BREAKAWAY_FROM_JOBfor the gateway respawn watcher because the SCM does the restart — the watcher logic can be removed on Windows.taskkill /F /PID <pid>) results in SCM respawning it within ~2 seconds..cmd(no service install).References