Skip to content

fix(gateway/run): Windows detached gateway /restart fails in job objects (WinError 5) #42116

@anthonytrance

Description

@anthonytrance

Windows Detached Gateway Restart Failure in Job Objects

Context

When running the Hermes gateway on Windows as a background service via pythonw.exe (installed using hermes gateway install), the process launches within a restricted Windows Job Object that explicitly denies parent process breakaway (JOB_OBJECT_LIMIT_BREAKAWAY_OK is not set).

The Bug

A recent commit (around June 7, 2026) consolidated all Windows detached spawns under windows_detach_popen_kwargs(). Inside gateway/run.py's _launch_detached_restart_command, this pulls in:
CREATE_BREAKAWAY_FROM_JOB (0x01000000)

Because the background thread is constrained inside the restrictive job environment, Windows blocks the call, throwing a permission error immediately:

ERROR gateway.run: Failed to launch detached gateway restart: [WinError 5] Access is denied

This prevents the background watcher from launching, killing the gateway instead of restarting it.

The Secondary Issue (Headless PATH Resolution)

Once the watcher is spawned, it executes:
[*hermes_cmd, "gateway", "restart"]

Where hermes_cmd evaluates to ["hermes"] from shutil.which("hermes"). Under headless background configurations (where parent shell setup blocks the PATH profile or doesn't pass environment aliases to task loops), "hermes" cannot be resolved by the kernel launcher.
Additionally, running gateway restart triggers _exec_schtasks directly inside the task loop. If schtasks triggers face UAC or administrative policy blocks, the reload silently drops.

The Solution

Instead of forcing breakaway flags, the background watcher should:

  1. Spawn the subprocess using windows_detach_flags_without_breakaway() directly to bypass the access denial. Since this is an abstract background watcher polling a PID, it has zero requirement for breakaway.
  2. Directly call gateway run instead of gateway restart inside the watcher, utilizing sys.executable -m hermes_cli.main to bypass PATH resolution issues.

Exact Patch:

# gateway/run.py, around line 4023:

    async def _launch_detached_restart_command(self) -> None:
        import shutil
        import subprocess

        hermes_cmd = _resolve_hermes_bin()
        if not hermes_cmd:
            logger.error("Could not locate hermes binary for detached /restart")
            return

        current_pid = os.getpid()

        # On Windows there's no bash/setsid chain — spawn a tiny Python
        # watcher directly via sys.executable instead. Runs the gateway directly
        # via sys.executable -m hermes_cli.main (bypassing the hermes
        # shim, which may not resolve in a detached background process).
        if sys.platform == "win32":
            import textwrap
            from hermes_cli._subprocess_compat import (
                windows_detach_flags_without_breakaway,
                windows_detach_popen_kwargs,
            )

            # Use sys.executable directly — avoids PATH/hermes-shim issues
            # in the detached subprocess context where user PATH may differ.
            cmd_argv = [sys.executable, "-m", "hermes_cli.main", "gateway", "run"]
            watcher = textwrap.dedent(
                """
                import os, subprocess, sys, time
                pid = int(sys.argv[1])
                cmd = sys.argv[2:]
                deadline = time.monotonic() + 120

                def _alive(p):
                    # On Windows, os.kill(pid, 0) is NOT a no-op — it maps to
                    # GenerateConsoleCtrlEvent(0, pid) (bpo-14484). Use the
                    # Win32 handle-based existence check instead.
                    if os.name == 'nt':
                        import ctypes
                        k32 = ctypes.windll.kernel32
                        k32.OpenProcess.restype = ctypes.c_void_p
                        k32.WaitForSingleObject.restype = ctypes.c_uint
                        k32.GetLastError.restype = ctypes.c_uint
                        h = k32.OpenProcess(0x1000 | 0x100000, False, int(p))
                        if not h:
                            return k32.GetLastError() != 87
                        try:
                            return k32.WaitForSingleObject(h, 0) == 0x102
                        finally:
                            k32.CloseHandle(h)
                    try:
                        os.kill(int(p), 0)
                        return True
                    except ProcessLookupError:
                        return False
                    except PermissionError:
                        return True
                    except OSError:
                        return False

                while time.monotonic() < deadline:
                    if not _alive(pid):
                        break
                    time.sleep(0.2)
                _CREATE_NEW_PROCESS_GROUP = 0x00000200
                _DETACHED_PROCESS = 0x00000008
                _CREATE_NO_WINDOW = 0x08000000
                subprocess.Popen(
                    cmd,
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL,
                    creationflags=_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW,
                )
                """
            ).strip()
            # Spawn the watcher with flags that do not include CREATE_BREAKAWAY_FROM_JOB
            # to prevent [WinError 5] Access Denied errors inside restricted Windows
            # job objects or scheduled background tasks.
            subprocess.Popen(
                [sys.executable, "-c", watcher, str(current_pid), *cmd_argv],
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
                creationflags=windows_detach_flags_without_breakaway(),
            )
            return

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions