feat(docker): auto-redirect gateway run to supervised mode inside s6 image#33583
Merged
Conversation
…6 image
Pre-s6, `docker run nousresearch/hermes-agent gateway run` was the
standard invocation: gateway ran as the container's main process,
tini reaped zombies, container exit code matched gateway exit code,
no supervision. With s6-overlay as PID 1, the same invocation now
auto-upgrades to supervised semantics — auto-restart on crash,
dashboard supervised alongside (when HERMES_DASHBOARD=1 is set),
multiple profile gateways under the same /init.
Users get the new behavior with zero changes to their docker run
command. A loud one-line breadcrumb on stderr explains the upgrade
and points at the opt-out for users who genuinely want pre-s6
foreground semantics.
How it works:
1. `_gateway_command_inner` (the `gateway run` handler) checks if
we're inside a container with s6 as PID 1.
2. If yes, dispatches `start` to the s6 service manager (registers
and starts gateway-default), then `exec sleep infinity` to keep
the CMD process alive without binding container lifetime to
gateway PID lifetime. The supervised gateway can flap freely;
`docker stop` still tears everything down via /init stage 3.
3. If no, falls through to the existing foreground code path
unchanged. Host runs of `hermes gateway run` are unaffected.
Three gates make the redirect inert outside the intended scope:
* `detect_service_manager() != "s6"` — host/non-s6-container runs.
* `HERMES_S6_SUPERVISED_CHILD=1` env var (recursion guard) —
exported by `S6ServiceManager._render_run_script` for the
s6-supervised invocation itself. Without this guard, the
supervised `gateway run --replace` would re-enter the redirect
and recurse (run → start → run → start → ...) infinitely.
* `--no-supervise` CLI flag OR `HERMES_GATEWAY_NO_SUPERVISE=1` env
var — explicit user opt-out for CI smoke tests, debugging the
foreground startup path, or any case wanting "CMD exit =
container exit" semantics. Strict truthiness (1/true/yes,
case-insensitive); typos like `=0` do NOT silently opt out.
Tests:
* Unit tests in tests/hermes_cli/test_gateway_s6_dispatch.py
cover all five paths (host no-op, supervised fire, sentinel
recursion guard, CLI flag, env var truthy + falsy). The two
load-bearing gates (sentinel + opt-out) were mutation-tested
by removing each gate in isolation and confirming the dedicated
test fails with the expected error.
* Docker harness tests in tests/docker/test_gateway_run_supervised.py
cover the round trips end-to-end against a built image: redirect
fires (sleep-infinity heartbeat + supervised gateway-default
slot + breadcrumb), --no-supervise opt-out (foreground gateway,
no want-up on the slot), HERMES_GATEWAY_NO_SUPERVISE env var
works identically, recursion is impossible (≤1 supervised
python gateway-run + exactly 1 sleep-infinity parented to the
CMD wrapper), and HERMES_DASHBOARD=1 produces both supervised
gateway and supervised dashboard.
Docs:
* Added a `:::tip Gateway runs supervised` admonition near the
main docker.md example explaining the upgrade and pointing at
the opt-out. Pre-s6 (tini-based) images still run gateway run
as the foreground main process, so the note is scoped to the
s6 image only.
Trade-off documented in the helper docstring: container exit code
under the redirect is sleep's exit code (always 0 on SIGTERM), not
the gateway's. That was an explicit design call — the supervised
gateway is allowed to flap without taking the container with it,
which is what "supervision" means. CI users who want exit-code
forwarding can pass --no-supervise.
Contributor
🔎 Lint report:
|
benbarclay
added a commit
that referenced
this pull request
May 28, 2026
Follow-up to #33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
10 tasks
benbarclay
added a commit
that referenced
this pull request
May 28, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike #33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in #33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
r266-tech
added a commit
to r266-tech/hermes-agent
that referenced
this pull request
May 28, 2026
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
pull Bot
pushed a commit
to HSKIMRobert/hermes-agent
that referenced
this pull request
May 28, 2026
…E from NousResearch#33583 (NousResearch#33751) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)
1 task
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green. #AI commit#
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision. #AI commit#
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
…E from NousResearch#33583 (NousResearch#33751) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) #AI commit#
zwolniony
pushed a commit
to zwolniony/hermes-agent
that referenced
this pull request
May 29, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
zwolniony
pushed a commit
to zwolniony/hermes-agent
that referenced
this pull request
May 29, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
…E from NousResearch#33583 (NousResearch#33751) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
Follow-up to NousResearch#33583 (the gateway-run-supervised redirect). Before this fix, the supervised gateway's stdout (most visibly the "Hermes Gateway Starting…" rich-console banner) was swallowed by `s6-log` into the rotated file at `${HERMES_HOME}/logs/gateways/<profile>/current` and never reached `docker logs`. Operational signal lived in two places: * **docker logs** — saw stderr (Python `logging` defaults to stderr), so warnings/errors were visible. * **the rotated file** — saw stdout (rich banners, `print()` output, third-party libs that wrote to fd 1). This was surprising for users coming from the pre-s6 image, where `docker run … gateway run` produced a single unified stream in `docker logs`. They'd see partial output, conclude something was broken, and dig around for the missing pieces. Fix: add the `1` s6-log action directive before the file destination so each line is forwarded to s6-log's stdout — which propagates up the s6-supervise pipeline to /init's stdout = container stdout = `docker logs`. The file destination is preserved as a second destination, so the rotated log (with ISO 8601 timestamps) still exists for `hermes logs` and for survival across container restarts. Trade-off considered: timestamps. Putting `T` between `1` and the file destination (not before `1`) means: * docker logs sees raw lines — Python's logging formatter has its own timestamps, and `docker logs --timestamps` adds another layer when desired. No double-stamping in the common reading path. * The persisted file gets s6-log's ISO 8601 timestamp so even output that lacked a Python-logger timestamp (rich banners, third-party raw prints) is correlatable in `current`. Verification: * New unit-test assertion in `test_service_manager.py` locks the `s6-log 1` directive into the rendered run-script. Mutation- tested by reverting to the pre-fix script (no `1`); the assert catches it cleanly. * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs` builds the image, runs `docker run … gateway run`, and asserts the unique `⚕` banner glyph reaches `docker logs`. Also verifies the rotated file still contains the banner (no regression on the existing file destination). Mutation-tested end-to-end: built a deliberately-broken image without the `1` directive and the test failed exactly as designed, citing the banner present in `current` but absent from `docker logs`. * `website/docs/user-guide/docker.md` gains a new `:::note Where gateway logs go` admonition documenting both destinations and the audit-log file at `${HERMES_HOME}/logs/container-boot.log`. Existing functionality preserved: every other docker-harness test still passes against the new image. Unit-test sweep across `tests/hermes_cli/` (5561 tests) is green.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
When operators ran `docker exec <c> hermes login` (or anything else that wrote under $HERMES_HOME) they defaulted to root, leaving /opt/data/auth.json root:root mode 0600. The supervised gateway (UID 10000) then couldn't read its own credentials and returned "Provider authentication failed: Hermes is not logged into Nous Portal" on every Telegram/Discord/etc. message — even though `docker exec <c> hermes chat -q ping` (also root) succeeded because root could read its own root-owned file. _load_auth_store swallowed PermissionError as a parse failure and copied the file aside as auth.json.corrupt, making the diagnostic more misleading. Fix: install a privilege-drop shim at /opt/hermes/bin/hermes, prepended ahead of the venv on PATH. When invoked as root the shim exec's the real venv binary via `s6-setuidgid hermes` — so any file the docker-exec session writes is uid-aligned with the supervised processes. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents, anything inside the container that's not coming through docker-exec) hit a single exec to the absolute venv path with no privilege change. Recursion is impossible: the shim exec's the venv binary by absolute path (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter the shim regardless of PATH state. No sentinel env var needed (unlike NousResearch#33583's gateway-run redirect which DOES need HERMES_S6_SUPERVISED_CHILD because there's no absolute-path equivalent for the s6 dispatch). Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for diagnostic sessions where the operator deliberately wants root. Strict truthiness (1/true/yes case-insensitive); typos like `=0` do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in NousResearch#33583. If `s6-setuidgid` is missing (someone stripped s6-overlay in a downstream fork), the shim exits 126 with a remediation message pointing at `--user hermes` and the opt-out — never silently runs as root. Test plan: - tests/docker/test_docker_exec_privilege_drop.py — 11 tests - shim drops root to hermes uid (file ownership check) - shim short-circuits for non-root docker exec - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root - strict-truthiness parametrization (5 falsy values reject) - main CMD path unaffected (recursion guard) - E2E: every file written by docker-exec is readable by uid 10000 - Full tests/docker/ harness: 32/32 pass against fresh image build - shellcheck --severity=error: clean - hadolint: clean - Manual: reproduced the original symptom (root-owned auth.json) by bypassing the shim; confirmed default docker-exec produces hermes-owned files; confirmed opt-out env keeps root semantics. Known follow-up: this prevents NEW instances of the bug. Volumes that already have root:root /opt/data/auth.json from a pre-shim image need a one-time `chown hermes:hermes` before rebooting onto the new image. A stage2-hook chown sweep can self-heal that, but is deferred per scope decision.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…E from NousResearch#33583 (NousResearch#33751) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pre-s6,
docker run nousresearch/hermes-agent gateway runwas the standard invocation: gateway ran as the container's main process, tini reaped zombies, container exit code matched gateway exit code, no supervision. With s6-overlay as PID 1, the same invocation now auto-upgrades to supervised semantics — auto-restart on crash, dashboard supervised alongside (whenHERMES_DASHBOARD=1), multiple profile gateways under the same/init.Users get the new behavior with zero changes to their
docker runcommand. A loud one-line breadcrumb on stderr explains the upgrade and points at the opt-out for users who genuinely want pre-s6 foreground semantics.How it works
_gateway_command_inner(thegateway runhandler) checks if we're inside a container with s6 as PID 1.startto the s6 service manager (registers and startsgateway-default), thenexec sleep infinityto keep the CMD process alive without binding container lifetime to gateway PID lifetime. The supervised gateway can flap freely;docker stopstill tears everything down cleanly via/initstage 3.hermes gateway runare unaffected.Three gates make the redirect inert outside the intended scope
detect_service_manager() != "s6"HERMES_S6_SUPERVISED_CHILD=1env varS6ServiceManager._render_run_script. Without this, the supervisedgateway run --replacewould re-enter the redirect and recurse (run → start → run → start → ...) infinitely--no-superviseCLI flag orHERMES_GATEWAY_NO_SUPERVISE=1env var=0do NOT silently opt outTrade-off (deliberate)
Container exit code under the redirect is
sleep's exit code (always 0 on SIGTERM), not the gateway's. The supervised gateway is allowed to flap without taking the container with it — that's what "supervision" means. CI users who want exit-code forwarding can pass--no-supervise.Test Plan
Unit tests —
tests/hermes_cli/test_gateway_s6_dispatch.py(+5 tests)test_redirect_noop_on_host— host runs are unaffectedtest_redirect_fires_inside_s6_container— dispatches start, prints breadcrumb to stderr, execssleep infinitytest_redirect_short_circuits_supervised_child— recursion guard workstest_redirect_respects_no_supervise_flag— CLI opt-out workstest_redirect_respects_no_supervise_env(parametrized over 5 truthy values) — env opt-out workstest_redirect_no_supervise_env_falsy_values_dont_opt_out— strict truthiness (0/false/no/garbage do NOT opt out)Mutation-tested: removed the sentinel gate and the opt-out gate in isolation; the dedicated tests fail with the expected pytest.fail messages each time. Both gates are load-bearing.
Docker harness tests —
tests/docker/test_gateway_run_supervised.py(new file, 5 tests)End-to-end against a built image:
test_gateway_run_redirects_to_supervised—sleep infinityheartbeat, supervisedgateway-defaultslot at want-up, breadcrumb in docker logstest_gateway_run_no_supervise_flag_preserves_legacy_behavior— foreground gateway, no want-up on slot, no breadcrumbtest_gateway_run_no_supervise_env_var— env-var opt-out works identicallytest_supervised_gateway_does_not_recurse— exactly 1 supervised pythongateway run+ exactly 1sleep infinityparented to CMD wrappertest_dashboard_supervised_when_env_set—HERMES_DASHBOARD=1gets you both supervised gateway and supervised dashboardService manager —
tests/hermes_cli/test_service_manager.pyexport HERMES_S6_SUPERVISED_CHILD=1sentinel into the rendered run-script assertionsRegression sweep
tests/hermes_cli/+tests/gateway/— 11,204 pass, 0 failtests/docker/— full suite passes againsthermes-agent-harness:s6runbuild of this branchDocs
Added a
:::tip Gateway runs supervisedadmonition near the main docker.md example explaining the upgrade and pointing at the opt-out. Pre-s6 (tini-based) images still rungateway runas the foreground main process, so the note is scoped to the s6 image only.Files
hermes_cli/service_manager.pyexport HERMES_S6_SUPERVISED_CHILD=1to_render_run_scripthermes_cli/gateway.py_maybe_redirect_run_to_s6_supervision()helper called from_gateway_command_innerhermes_cli/main.py--no-supervisearg on thegateway runsubparser with help texttests/hermes_cli/test_service_manager.pytests/hermes_cli/test_gateway_s6_dispatch.pytests/docker/test_gateway_run_supervised.pywebsite/docs/user-guide/docker.md:::tipadmonition explaining behavior + opt-outInfographic