fix(docker): boot non-root containers, skip the s6-setuidgid drop when already unprivileged.#34837
Merged
Conversation
Collaborator
This was referenced Jun 1, 2026
benbarclay
added a commit
that referenced
this pull request
Jun 4, 2026
…ar guidance (#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from #34648/#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that #34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.
JoeKowal
pushed a commit
to JoeKowal/hermes-agent
that referenced
this pull request
Jun 4, 2026
Yuki-14544869
pushed a commit
to Yuki-14544869/hermes-agent
that referenced
this pull request
Jun 4, 2026
…ar guidance (NousResearch#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.
davidgut1982
pushed a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 5, 2026
…ar guidance (NousResearch#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…ar guidance (NousResearch#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes the s6-overlay boot loop that hits any container started as a non-root user.
The container drops privileges to the unprivileged
hermesuser vias6-setuidgid hermes <cmd>in every boot script.s6-setuidgidcallssetgroups(), which requiresCAP_SETGID. A container started as root has it; a container started non-root does not, so everys6-setuidgidinvocation dies with:The cont-init hooks exit 111 and the supervised services crash-loop, so the container never finishes booting.
The fix guards each privilege drop: if already non-root, run the command directly; only call
s6-setuidgidwhen we're root (there's something to drop). This restores thev0.14behavior for non-root containers while leaving root containers byte-for-byte unchanged.Related Issue
Fixes #34648
Type of Change
Changes Made
All changes apply the same guard:
[ "$(id -u)" = 0 ]→ only drop vias6-setuidgidwhen root; otherwise exec directly.docker/main-wrapper.sh: add adrop()helper and route the three CMD exec paths through it.docker/stage2-hook.sh: add anas_hermes()helper; route the four inline drops (mkdir / tee / cp / skills-sync) through it.docker/cont-init.d/02-reconcile-profiles: guard thecontainer_bootdrop.docker/s6-rc.d/dashboard/run: guard the dashboard drop (command/flags unchanged).hermes_cli/service_manager.py: guard the generated per-profile gatewayrunand logrunscripts (_render_gateway_run/_render_log_run) — these were the second set of drops, emitted at runtime.Security note
This does not skip the privilege drop for root containers. The guard only no-ops the drop when the container is already running as a non-root user, where
setgroups()is both impossible (noCAP_SETGID) and unnecessary (we're already unprivileged). There is no path where a root process avoids dropping tohermes. No privilege escalation, no change to network exposure, and the dashboard's auth (OAuth gate, replay-secret check,--insecuredefault) is untouched. Only the OS user the process runs as changes, and only in the non-root case the operator explicitly opted into.How to Test
docker run --rm --user 10000:10000 <pre-fix v0.15 image>→ boot loops withs6-applyuidgid: ... Operation not permitted.--user 10000:10000run → boots clean, noOperation not permitted, no cont-initexited 111, services start and stay up.docker run(root) → boots normally and the workload still runs ashermes(UID 10000) vias6-setuidgid.pytest tests/hermes_cli/test_service_manager.py tests/test_docker_home_override_scripts.py -q— the generated-script assertions still pass (theexec s6-setuidgid hermes …lines are retained as the root branch).Checklist
Code
pytest tests/ -qand all tests pass.Documentation & Housekeeping
cli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.md/AGENTS.mdif I changed architecture or workflows — N/A