Problem or Use Case
Hermes’ Docker image can now expose a generic health signal based on whether PID 1 is alive, but that is still weaker than what Docker could report for the two main long-running service modes: hermes gateway run and hermes dashboard.
For those modes, process liveness is not always the best signal:
- a gateway process may still exist while Hermes has already recorded
startup_failed or otherwise is not operational
- a dashboard process may still exist even if the local web server is not actually serving requests
As a result, Docker health status is less useful than it could be for Compose deployments, dashboards, restart policies, and operational monitoring. Users running Hermes as a service would benefit from an application-level healthcheck when Hermes is in a known service mode, while still preserving a safe fallback for interactive CLI and one-off commands.
Proposed Solution
Make the Docker healthcheck script mode-aware by inspecting PID 1’s command line and selecting the probe strategy based on the active Hermes mode.
Suggested behavior:
- If PID 1 is running
hermes gateway run:
- read the gateway runtime status from Hermes’ persisted status file
- report healthy only when the gateway state indicates a live running gateway
- report unhealthy for states like startup_failed, missing/stale state, or a dead gateway process
- If PID 1 is running
hermes dashboard:
- probe the local dashboard server with an HTTP request
- use the existing dashboard status endpoint, e.g.
GET /api/status
- report healthy only when the local dashboard responds successfully
- For all other commands:
- fall back to the generic process-level check
- healthy if PID 1 exists and is not a zombie
Implementation outline:
- keep the Dockerfile
HEALTHCHECK pointing to a single docker/healthcheck.sh
- in the script, inspect
/proc/1/cmdline to detect the active mode
- branch to:
- gateway-aware probe
- dashboard-aware probe
- generic fallback
- document the behavior clearly in the Docker docs so users understand that health semantics depend on the Hermes mode being run
This should remain conservative:
- no assumptions about arbitrary one-off subcommands
- no requirement that every container mode expose HTTP
- no breaking behavior for interactive CLI usage
Alternatives Considered
Keep the current generic PID 1 healthcheck for every mode. This is simple and safe, but it does not tell Docker whether the gateway or dashboard is actually operational.
Use an HTTP-only healthcheck for all modes. This does not fit Hermes because many supported container invocations do not run an HTTP service.
Use a gateway-only healthcheck. That would improve the most common service mode, but it would still miss the dashboard case and would make the image behavior less consistent across supported long-running modes.
Feature Type
Performance / reliability
Scope
Medium (few files, < 300 lines)
Contribution
Debug Report (optional)
Problem or Use Case
Hermes’ Docker image can now expose a generic health signal based on whether PID 1 is alive, but that is still weaker than what Docker could report for the two main long-running service modes:
hermes gateway runandhermes dashboard.For those modes, process liveness is not always the best signal:
startup_failedor otherwise is not operationalAs a result, Docker health status is less useful than it could be for Compose deployments, dashboards, restart policies, and operational monitoring. Users running Hermes as a service would benefit from an application-level healthcheck when Hermes is in a known service mode, while still preserving a safe fallback for interactive CLI and one-off commands.
Proposed Solution
Make the Docker healthcheck script mode-aware by inspecting PID 1’s command line and selecting the probe strategy based on the active Hermes mode.
Suggested behavior:
hermes gateway run:hermes dashboard:GET /api/statusImplementation outline:
HEALTHCHECKpointing to a singledocker/healthcheck.sh/proc/1/cmdlineto detect the active modeThis should remain conservative:
Alternatives Considered
Keep the current generic PID 1 healthcheck for every mode. This is simple and safe, but it does not tell Docker whether the gateway or dashboard is actually operational.
Use an HTTP-only healthcheck for all modes. This does not fit Hermes because many supported container invocations do not run an HTTP service.
Use a gateway-only healthcheck. That would improve the most common service mode, but it would still miss the dashboard case and would make the image behavior less consistent across supported long-running modes.
Feature Type
Performance / reliability
Scope
Medium (few files, < 300 lines)
Contribution
Debug Report (optional)