Skip to content

fix(docker): set S6_KEEP_ENV=1 to preserve container environment#33148

Closed
ugoenyioha wants to merge 1 commit into
NousResearch:mainfrom
ugoenyioha:fix/s6-keep-env
Closed

fix(docker): set S6_KEEP_ENV=1 to preserve container environment#33148
ugoenyioha wants to merge 1 commit into
NousResearch:mainfrom
ugoenyioha:fix/s6-keep-env

Conversation

@ugoenyioha

Copy link
Copy Markdown

Summary

s6-overlay v3 strips the container's environment by default before exec'ing the main program (CMD). This causes all Kubernetes-injected env vars (MATTERMOST_TOKEN, MATTERMOST_URL, HERMES_HOME, HOME, provider API keys) to vanish by the time the hermes gateway process starts.

The gateway then fails to detect any messaging platforms because os.getenv() returns empty for every credential, producing:

WARNING gateway.run: No messaging platforms enabled.

Root Cause

The migration from tini (v2026.5.16) to s6-overlay on main changed the PID 1 from tini (which preserves env) to s6's /init (which strips env by default per the s6-overlay docs).

The #!/command/with-contenv shebang in s6 service scripts restores the env from /run/s6/container_environment/, but main-wrapper.sh (the CMD) uses plain #!/bin/sh and does NOT use with-contenv.

Fix

One line: ENV S6_KEEP_ENV=1 in the Dockerfile. This preserves the full container environment for the main program and all cont-init.d scripts.

Verification

Tested on a two-tenant Hermes deployment on Talos Kubernetes:

  • Without fix: HERMES_HOME=<UNSET>, MATTERMOST_TOKEN len=0, gateway reports "No messaging platforms enabled"
  • With fix: HERMES_HOME=/opt/data, MATTERMOST_TOKEN len=26, gateway detects and connects to Mattermost

Impact

This affects every Docker/Kubernetes deployment running from main since the s6-overlay migration. The v2026.5.16 release (tini-based) is not affected.

s6-overlay v3 strips the container's environment before execing the
main program (CMD). This causes all Kubernetes-injected env vars
(MATTERMOST_TOKEN, MATTERMOST_URL, HERMES_HOME, HOME, API keys)
to vanish by the time the hermes gateway process starts. The gateway
then fails to detect any messaging platforms because os.getenv()
returns empty for every credential.

v2026.5.16 used tini as PID 1 (no env stripping), so this was never
an issue. The migration to s6-overlay on main introduced the
regression.

Fix: ENV S6_KEEP_ENV=1 in the Dockerfile, per the s6-overlay docs.
This preserves the full container environment for the main program
and all cont-init.d scripts.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround area/docker Docker image, Compose, packaging backend/docker Docker container execution labels May 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to merged #32412 which fixed s6-overlay env stripping using #!/command/with-contenv shebang on main-wrapper.sh and cont-init scripts. This PR takes a different approach — S6_KEEP_ENV=1 globally preserves the container environment. Both address the same root cause (#33004, #33001). Verify whether #32412's with-contenv approach already resolves this for K8s deployments, or if S6_KEEP_ENV=1 is still needed as a belt-and-suspenders fix.

@smiggiddy

Copy link
Copy Markdown

I'm still seeing issues on k8s even after using #32412

@benbarclay

Copy link
Copy Markdown
Collaborator

Thanks for chasing this down @ugoenyioha — your root-cause analysis is exactly right (s6-overlay v3 scrubs env before exec'ing CMD). The fix already landed on main via a different route: #32412 (commit 628aaea) changed docker/main-wrapper.sh's shebang from #!/bin/sh to #!/command/with-contenv sh, which sources /run/s6/container_environment/ per-script rather than globally preserving env. Same outcome with tighter blast radius (no global s6 behavior change). #33481 then extended the same with-contenv-sources-env-plus-explicit-HOME-reset pattern to the supervised dashboard and dynamic-gateway run scripts.

Closing as superseded — the deployment scenario you described should work on current main. Please re-open if you hit it on a freshly built image from latest main.

@benbarclay benbarclay closed this May 28, 2026
Bartok9 added a commit to Bartok9/hermes-agent that referenced this pull request May 29, 2026
…NousResearch#34192)

NousResearch#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:

  /usr/bin/tini: No such file or directory

The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.

Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.

The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.

Other issues NousResearch#34192 raises that are NOT addressed by this PR:

  * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by NousResearch#33148
    (S6_KEEP_ENV=1) and NousResearch#32412 (with-contenv shebangs). The Hostinger
    template likely needs to update its env-var propagation.

  * Problem #3 (incompatible session formats): RFC for pluggable
    SessionDB is tracked in NousResearch#23717.

  * Problem #4 (Telegram polling conflict): an operations problem on
    Hostinger's side, not in this codebase.

This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.

Tests (3 in test_dockerfile_tini_compat_shim.py):

  - test_tini_compat_symlink_present
    Guard: the symlink line must exist in Dockerfile.
  - test_tini_compat_comment_explains_why
    The NousResearch#34192 anchor comment must be present so future readers know
    why the shim is there (avoid accidental removal).
  - test_entrypoint_still_init_not_tini
    Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
    only for external wrappers.

Refs: NousResearch#34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.

Co-authored-by: Cursor <cursoragent@cursor.com>
benbarclay pushed a commit that referenced this pull request Jun 1, 2026
…#34192) (#34382)

#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:

  /usr/bin/tini: No such file or directory

The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.

Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.

The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.

Other issues #34192 raises that are NOT addressed by this PR:

  * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by #33148
    (S6_KEEP_ENV=1) and #32412 (with-contenv shebangs). The Hostinger
    template likely needs to update its env-var propagation.

  * Problem #3 (incompatible session formats): RFC for pluggable
    SessionDB is tracked in #23717.

  * Problem #4 (Telegram polling conflict): an operations problem on
    Hostinger's side, not in this codebase.

This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.

Tests (3 in test_dockerfile_tini_compat_shim.py):

  - test_tini_compat_symlink_present
    Guard: the symlink line must exist in Dockerfile.
  - test_tini_compat_comment_explains_why
    The #34192 anchor comment must be present so future readers know
    why the shim is there (avoid accidental removal).
  - test_entrypoint_still_init_not_tini
    Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
    only for external wrappers.

Refs: #34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.

Co-authored-by: Cursor <cursoragent@cursor.com>
JoeKowal pushed a commit to JoeKowal/hermes-agent that referenced this pull request Jun 4, 2026
…NousResearch#34192) (NousResearch#34382)

NousResearch#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:

  /usr/bin/tini: No such file or directory

The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.

Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.

The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.

Other issues NousResearch#34192 raises that are NOT addressed by this PR:

  * Problem NousResearch#2 (UID 1024 vs 10000 mismatch): already fixed by NousResearch#33148
    (S6_KEEP_ENV=1) and NousResearch#32412 (with-contenv shebangs). The Hostinger
    template likely needs to update its env-var propagation.

  * Problem NousResearch#3 (incompatible session formats): RFC for pluggable
    SessionDB is tracked in NousResearch#23717.

  * Problem NousResearch#4 (Telegram polling conflict): an operations problem on
    Hostinger's side, not in this codebase.

This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.

Tests (3 in test_dockerfile_tini_compat_shim.py):

  - test_tini_compat_symlink_present
    Guard: the symlink line must exist in Dockerfile.
  - test_tini_compat_comment_explains_why
    The NousResearch#34192 anchor comment must be present so future readers know
    why the shim is there (avoid accidental removal).
  - test_entrypoint_still_init_not_tini
    Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
    only for external wrappers.

Refs: NousResearch#34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging backend/docker Docker container execution P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants