Bug Description
dashboard /api/status reports gateway_running: false / gateway_state: "stopped" / gateway_pid: null when Hermes runs as the container entrypoint in Docker/Kubernetes (PID-1 pattern), even though the gateway is fully functional: hermes status CLI correctly reports ✓ running / docker (foreground), cron jobs deliver, platform handlers (Feishu/Telegram/etc.) connect, and request handling works.
This is the dashboard counterpart to #4776 (CLI status path). PR #4792 was auto-closed by hermes-sweeper on the grounds that the CLI path was already refactored upstream into hermes_cli.gateway.get_gateway_runtime_snapshot() with is_container() detection — that is correct for the CLI, but the dashboard's /api/status handler in hermes_cli/web_server.py takes a different code path that still calls gateway.status.get_running_pid() (lock/pidfile-based), which the refactor did not touch. The sweeper review missed this case.
Steps to Reproduce
- Run Hermes via Docker with the standard image (v0.13.0 /
2026.5.7):
command: ["gateway", "run"]
environment:
HERMES_DASHBOARD: "1"
HERMES_DASHBOARD_HOST: 0.0.0.0
HERMES_DASHBOARD_PORT: "9119"
- Wait for the container to become healthy.
- Confirm gateway is alive:
$ docker exec hermes-agent hermes status
◆ Gateway Service
Status: ✓ running
Manager: docker (foreground)
PID(s): 7
- Hit dashboard:
$ curl http://127.0.0.1:9119/api/status
{"version":"0.13.0","gateway_running":false,"gateway_pid":null,
"gateway_state":"stopped","gateway_platforms":{},...}
Expected Behavior
/api/status should report gateway_running: true / gateway_state: "running" whenever a hermes gateway run process is alive in the container — consistent with what hermes status CLI reports.
Actual Behavior
Always reports stopped. Affects every dashboard consumer: TUI dashboard widgets, status badge, anything polling /api/status.
Root Cause (verified)
hermes_cli/web_server.py:537-545 /api/status handler:
gateway_pid = get_running_pid() # from gateway.status — depends on pid/lock files
gateway_running = gateway_pid is not None
gateway.status.get_running_pid() first checks is_gateway_runtime_lock_active(), which depends on gateway.pid + gateway.lock being present. In the PID-1 entrypoint pattern these files are never reliably written (the fcntl lock fd is released after startup; the pidfile is then cleaned up by _cleanup_invalid_pid_path). So get_running_pid() returns None → handler reports stopped.
Affected Component
Web server (hermes_cli/web_server.py, line ~537)
Environment
- Hermes v0.13.0 (
2026.5.7)
- Linux container (Debian 13.x base)
- Python 3.13.5
cap_drop: [ALL] + selective adds (representative of locked-down deployments)
Proposed Fix
Mirror PR #4792's pgrep fallback approach, applied to the dashboard handler. Add an is_container()-gated _scan_gateway_pid_in_container() helper invoked when the local pid/lock check returns None. Use pgrep -f "hermes gateway run" for the candidate list, then re-validate each PID via /proc/<pid>/cmdline argv tokens (must contain gateway and run as independent tokens) to defend against pgrep -f's substring matching accidentally hitting python -c debug invocations.
Are you willing to submit a PR for this?
Bug Description
dashboard /api/statusreportsgateway_running: false/gateway_state: "stopped"/gateway_pid: nullwhen Hermes runs as the container entrypoint in Docker/Kubernetes (PID-1 pattern), even though the gateway is fully functional:hermes statusCLI correctly reports✓ running/docker (foreground), cron jobs deliver, platform handlers (Feishu/Telegram/etc.) connect, and request handling works.This is the dashboard counterpart to #4776 (CLI status path). PR #4792 was auto-closed by
hermes-sweeperon the grounds that the CLI path was already refactored upstream intohermes_cli.gateway.get_gateway_runtime_snapshot()withis_container()detection — that is correct for the CLI, but the dashboard's/api/statushandler inhermes_cli/web_server.pytakes a different code path that still callsgateway.status.get_running_pid()(lock/pidfile-based), which the refactor did not touch. The sweeper review missed this case.Steps to Reproduce
2026.5.7):Expected Behavior
/api/statusshould reportgateway_running: true/gateway_state: "running"whenever ahermes gateway runprocess is alive in the container — consistent with whathermes statusCLI reports.Actual Behavior
Always reports
stopped. Affects every dashboard consumer: TUI dashboard widgets, status badge, anything polling/api/status.Root Cause (verified)
hermes_cli/web_server.py:537-545/api/statushandler:gateway.status.get_running_pid()first checksis_gateway_runtime_lock_active(), which depends ongateway.pid+gateway.lockbeing present. In the PID-1 entrypoint pattern these files are never reliably written (the fcntl lock fd is released after startup; the pidfile is then cleaned up by_cleanup_invalid_pid_path). Soget_running_pid()returnsNone→ handler reportsstopped.Affected Component
Web server (
hermes_cli/web_server.py, line ~537)Environment
2026.5.7)cap_drop: [ALL]+ selective adds (representative of locked-down deployments)Proposed Fix
Mirror PR #4792's
pgrepfallback approach, applied to the dashboard handler. Add anis_container()-gated_scan_gateway_pid_in_container()helper invoked when the local pid/lock check returnsNone. Usepgrep -f "hermes gateway run"for the candidate list, then re-validate each PID via/proc/<pid>/cmdlineargv tokens (must containgatewayandrunas independent tokens) to defend againstpgrep -f's substring matching accidentally hittingpython -cdebug invocations.Are you willing to submit a PR for this?