Skip to content

dashboard: PermissionError on stale root-owned gateway.lock crashes /api/status with HTTP 500 #18935

@renne

Description

@renne

Problem

When the hermes gateway container is started without entrypoint.sh (i.e. the gosu-based privilege drop is bypassed), gateway processes run as root and create gateway.lock owned by root:root 0600.

After correcting the entrypoint, the dashboard process (running as hermes, uid 10000) polls /api/status, which calls get_running_pid()is_gateway_runtime_lock_active()open(lock_path, 'a+'). Because the file is root-owned, this raises PermissionError.

There is no try/except around get_running_pid() in the /api/status handler (web_server.py ~line 518), so the exception propagates as HTTP 500. The SSE/events client receives the error response and the frontend shows:

events feed disconnected — tool calls may not appear

Reproduction

  1. Start hermes container without entrypoint.sh so it runs as root.
  2. Let it create gateway.lock (owned root:root).
  3. Fix the entrypoint (re-add entrypoint.sh), restart the container.
  4. Dashboard polls /api/statusPermissionError → HTTP 500 → events feed shows disconnected.

Root cause

is_gateway_runtime_lock_active() in gateway/status.py opens the lock file with 'a+' (write mode) even when only checking lock status. If the file is owned by another user, this throws PermissionError with no handler.

Fix

Wrap the open() call in a PermissionError handler. Since the hermes user owns the parent directory ($HERMES_HOME), it can unlink() the stale file despite not owning it:

try:
    handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
    logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False

This is self-healing: the stale file is removed on first poll and subsequent checks work normally.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/dockerDocker image, Compose, packagingcomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions