Skip to content

os.kill(pid, 0) in gateway/status.py raises unhandled OSError on Windows #14359

@hillerliao

Description

@hillerliao

Description

On Windows, when a stale gateway.pid file exists (e.g. after a non-graceful shutdown), restarting the gateway fails because os.kill(existing_pid, 0) raises OSError: [WinError 11] instead of ProcessLookupError.

Location

gateway/status.py, around line 364:

try:
    os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError):
    stale = True

Problem

On Windows, when the PID no longer exists, os.kill can raise a generic OSError (e.g. [WinError 11]), which is not caught by the current handler. This causes the gateway to fail to start until the PID file is manually deleted.

This issue recurs every time the gateway exits non-gracefully (closing terminal, system crash, unhandled Ctrl+C, etc.).

Suggested Fix

Add OSError to the exception tuple:

except (ProcessLookupError, PermissionError, OSError):
    stale = True

Note: ProcessLookupError is already a subclass of OSError, so this is a safe superset. If that feels too broad, an alternative is to check sys.platform == "win32" or catch OSError and filter by errno.

Environment

  • OS: Windows 10/11
  • Python: 3.x (venv)
  • Hermes Gateway: latest

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryduplicateThis issue or pull request already existssweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions