fix(docker): boot non-root containers, skip the s6-setuidgid drop when already unprivileged. by IAvecilla · Pull Request #34837 · NousResearch/hermes-agent

IAvecilla · 2026-05-29T20:05:33Z

What does this PR do?

Fixes the s6-overlay boot loop that hits any container started as a non-root user.
The container drops privileges to the unprivileged hermes user via s6-setuidgid hermes <cmd> in every boot script. s6-setuidgid calls setgroups(), which requires CAP_SETGID. A container started as root has it; a container started non-root does not, so every s6-setuidgid invocation dies with:

s6-applyuidgid: fatal: unable to set supplementary group list: Operation not permitted

The cont-init hooks exit 111 and the supervised services crash-loop, so the container never finishes booting.

The fix guards each privilege drop: if already non-root, run the command directly; only call s6-setuidgid when we're root (there's something to drop). This restores the v0.14 behavior for non-root containers while leaving root containers byte-for-byte unchanged.

Related Issue

Fixes #34648

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

All changes apply the same guard: [ "$(id -u)" = 0 ] → only drop via s6-setuidgid when root; otherwise exec directly.

docker/main-wrapper.sh: add a drop() helper and route the three CMD exec paths through it.
docker/stage2-hook.sh: add an as_hermes() helper; route the four inline drops (mkdir / tee / cp / skills-sync) through it.
docker/cont-init.d/02-reconcile-profiles: guard the container_boot drop.
docker/s6-rc.d/dashboard/run: guard the dashboard drop (command/flags unchanged).
hermes_cli/service_manager.py: guard the generated per-profile gateway run and log run scripts (_render_gateway_run / _render_log_run) — these were the second set of drops, emitted at runtime.

Security note

This does not skip the privilege drop for root containers. The guard only no-ops the drop when the container is already running as a non-root user, where setgroups() is both impossible (no CAP_SETGID) and unnecessary (we're already unprivileged). There is no path where a root process avoids dropping to hermes. No privilege escalation, no change to network exposure, and the dashboard's auth (OAuth gate, replay-secret check, --insecure default) is untouched. Only the OS user the process runs as changes, and only in the non-root case the operator explicitly opted into.

How to Test

Reproduce (pre-fix): docker run --rm --user 10000:10000 <pre-fix v0.15 image> → boot loops with s6-applyuidgid: ... Operation not permitted.
Non-root (fixed): same --user 10000:10000 run → boots clean, no Operation not permitted, no cont-init exited 111, services start and stay up.
Root (unchanged): default docker run (root) → boots normally and the workload still runs as hermes (UID 10000) via s6-setuidgid.
Unit tests: pytest tests/hermes_cli/test_service_manager.py tests/test_docker_home_override_scripts.py -q — the generated-script assertions still pass (the exec s6-setuidgid hermes … lines are retained as the root branch).

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits.
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/ -q and all tests pass.
I've added tests for my changes
I've tested on my platform.

Documentation & Housekeeping

I've updated relevant documentation — N/A (inline comments only)
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md / AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact — N/A (container-only; root path unchanged, only adds a non-root fallback)
I've updated tool descriptions/schemas if I changed tool behavior — N/A

alt-glitch · 2026-05-29T20:21:38Z

Competing with #34684 (same issue #34648). Both fix s6-setuidgid boot loop for non-root containers. This PR has broader scope (guards all drop sites including service_manager.py generated scripts). See also merged #34407, #33078, #32412 for prior s6 fixes in the same area.

…ar guidance (#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from #34648/#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that #34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.

…ar guidance (NousResearch#38579) `docker run --user $(id -u):$(id -g)` was a tini-era trick to make container-written files match the host user. Under s6-overlay it no longer works: the bootstrap (UID remap, volume + build-tree chown, config seeding) needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui, node_modules) are owned by the hermes build UID (10000). A pinned arbitrary UID can't write them, so the runtime fails with EACCES on a bind mount or hard-crashes on a named volume (Docker inits the volume from the image as 10000; the non-root start can't even `cd /opt/data`, and the profile reconciler dies with PermissionError on gateway_state.json). Detect that start early in both the cont-init hook (stage2-hook.sh) and the CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID aliases), which remaps the hermes user and chowns the volume — the same host-UID-matching outcome --user was used for, without breaking s6. The guard fires only when the current UID is neither root NOR the hermes UID. This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with `--user 10000:10000`, i.e. pinned to the hermes UID itself), which is unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made writable is rejected. Verified live across five scenarios (built image, bind + named volume): arbitrary --user on bind -> rejected with guidance, hermes does not run; arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash; --user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not tripped; default root start -> boots. Pre-fix control reproduces the raw PermissionError + 'can't cd' crash with no guidance.

Remove prviliges drop when you never ran as root

0eb26fe

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround area/docker Docker image, Compose, packaging labels May 29, 2026

benbarclay merged commit 380ce47 into NousResearch:main Jun 1, 2026
25 checks passed

JoeKowal pushed a commit to JoeKowal/hermes-agent that referenced this pull request Jun 4, 2026

Remove prviliges drop when you never ran as root (NousResearch#34837)

0a9d9df

github-actions Bot mentioned this pull request Jun 6, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.29.2 to v2026.6.5 Docker-Hub-sirmark/docker-hermes-agent#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): boot non-root containers, skip the s6-setuidgid drop when already unprivileged.#34837

fix(docker): boot non-root containers, skip the s6-setuidgid drop when already unprivileged.#34837
benbarclay merged 1 commit into
NousResearch:mainfrom
IAvecilla:fix/non-root-s6-boot-loop

IAvecilla commented May 29, 2026 •

edited

Loading

Uh oh!

alt-glitch commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IAvecilla commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

Security note

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

alt-glitch commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IAvecilla commented May 29, 2026 •

edited

Loading