Problem
When the hermes gateway container is started without entrypoint.sh (i.e. the gosu-based privilege drop is bypassed), gateway processes run as root and create gateway.lock owned by root:root 0600.
After correcting the entrypoint, the dashboard process (running as hermes, uid 10000) polls /api/status, which calls get_running_pid() → is_gateway_runtime_lock_active() → open(lock_path, 'a+'). Because the file is root-owned, this raises PermissionError.
There is no try/except around get_running_pid() in the /api/status handler (web_server.py ~line 518), so the exception propagates as HTTP 500. The SSE/events client receives the error response and the frontend shows:
events feed disconnected — tool calls may not appear
Reproduction
- Start
hermes container without entrypoint.sh so it runs as root.
- Let it create
gateway.lock (owned root:root).
- Fix the entrypoint (re-add
entrypoint.sh), restart the container.
- Dashboard polls
/api/status — PermissionError → HTTP 500 → events feed shows disconnected.
Root cause
is_gateway_runtime_lock_active() in gateway/status.py opens the lock file with 'a+' (write mode) even when only checking lock status. If the file is owned by another user, this throws PermissionError with no handler.
Fix
Wrap the open() call in a PermissionError handler. Since the hermes user owns the parent directory ($HERMES_HOME), it can unlink() the stale file despite not owning it:
try:
handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
try:
resolved_lock_path.unlink()
except OSError:
pass
return False
This is self-healing: the stale file is removed on first poll and subsequent checks work normally.
Related
Problem
When the
hermesgateway container is started withoutentrypoint.sh(i.e. thegosu-based privilege drop is bypassed), gateway processes run asrootand creategateway.lockowned byroot:root 0600.After correcting the entrypoint, the dashboard process (running as
hermes, uid 10000) polls/api/status, which callsget_running_pid()→is_gateway_runtime_lock_active()→open(lock_path, 'a+'). Because the file is root-owned, this raisesPermissionError.There is no try/except around
get_running_pid()in the/api/statushandler (web_server.py ~line 518), so the exception propagates as HTTP 500. The SSE/events client receives the error response and the frontend shows:Reproduction
hermescontainer withoutentrypoint.shso it runs as root.gateway.lock(owned root:root).entrypoint.sh), restart the container./api/status—PermissionError→ HTTP 500 → events feed shows disconnected.Root cause
is_gateway_runtime_lock_active()ingateway/status.pyopens the lock file with'a+'(write mode) even when only checking lock status. If the file is owned by another user, this throwsPermissionErrorwith no handler.Fix
Wrap the
open()call in aPermissionErrorhandler. Since thehermesuser owns the parent directory ($HERMES_HOME), it canunlink()the stale file despite not owning it:This is self-healing: the stale file is removed on first poll and subsequent checks work normally.
Related
--insecure+ reverse proxy