Skip to content

fix(docker): reject unsupported --user <arbitrary-uid> start with clear guidance#38579

Merged
benbarclay merged 1 commit into
mainfrom
fix/docker-user-flag-guidance
Jun 4, 2026
Merged

fix(docker): reject unsupported --user <arbitrary-uid> start with clear guidance#38579
benbarclay merged 1 commit into
mainfrom
fix/docker-user-flag-guidance

Conversation

@benbarclay

Copy link
Copy Markdown
Collaborator

Problem

docker run --user $(id -u):$(id -g) was a tini-era trick to make
container-written files match the host user. Under the current s6-overlay
image it silently breaks:

  • Bind mount → boots read-only, but every write to the baked image trees
    (/opt/hermes/.venv, ui-tui, node_modules, owned by the hermes build UID
    10000) fails with EACCES — lazy installs and TUI rebuilds are dead.
  • Named volume → hard crash on first boot:
    PermissionError: [Errno 13] Permission denied: '/opt/data/gateway_state.json'
    cont-init: 02-reconcile-profiles exited 1
    main-wrapper.sh: 35: cd: can't cd to /opt/data
    
    Docker initialises the named volume from the image as UID 10000; the
    arbitrary --user UID can't even cd into $HERMES_HOME.

Root cause: the bootstrap (UID remap, volume/build-tree chown, config seeding)
all require root, and are skipped on a non-root start. --user with an
arbitrary UID was a casualty of the tini→s6 migration that was never made to
actually work.

Fix (Option A — redirect to the supported path)

Detect the unsupported start early — in both the cont-init hook
(stage2-hook.sh) and the CMD wrapper (main-wrapper.sh, the surface the user
sees in docker run output) — and fail fast with actionable guidance
instead of crashing on cd/EACCES downstream:

[hermes] ERROR: container started with --user 1000 (an arbitrary, non-hermes UID) — not supported.

To make container-written files match your HOST user, don't use --user.
Start as root (the default) and pass your host UID/GID instead:

    docker run -e HERMES_UID=$(id -u) -e HERMES_GID=$(id -g) ...

NAS users (Synology / unRAID / UGOS) can use the PUID/PGID aliases:

    docker run -e PUID=$(id -u) -e PGID=$(id -g) ...

The supported HERMES_UID/PUID path remaps the hermes user and chowns the
volume at boot, giving the same host-UID-matching outcome --user was used
for, without breaking the s6 supervision tree.

Why this does NOT revert #34837 / re-break #34648

The guard fires only when the current UID is neither root NOR the hermes
UID
. #34648's supported non-root start uses user: "10000:10000" — pinned to
the hermes UID itself — so cur_uid == id -u hermes and the guard skips it.
#34837 fixed the boot-loop for that case; this PR rejects only the arbitrary-UID
variant that #34837 never made writable (confirmed by reproducing the EACCES /
crash on current main).

Verification

Unit tests (tests/tools/test_stage2_hook_user_flag_guard.py, 6 tests)

Extracts the guard from each script and runs it with id stubbed; asserts
arbitrary UID → exit 1 + guidance, and root / --user <hermes-uid> /
remapped-hermes-uid → pass through. All 20 stage2 contract tests green.

Live E2E (built image, bind + named volume)

scenario expected result
--user 1000:1000 + bind mount rejected w/ guidance, hermes does not run
--user 1000:1000 + named volume guidance shown, no raw can't cd crash
--user 10000:10000 (hermes UID, #34648) boots
root + HERMES_UID=4242 remap boots, guard not tripped
default root start boots

Pre-fix control reproduces the raw PermissionError + can't cd crash with no
guidance.

Scope

docker/ lane only — stage2-hook.sh + main-wrapper.sh + a contract test.
No runtime/Python behavior change.

Follow-up

This is the redirect approach. Genuinely restoring full --user <arbitrary-uid>
parity (world-writable build trees / relocated runtime state + s6 tuning) is a
larger, separate change and is intentionally out of scope here.

…ar guidance

`docker run --user $(id -u):$(id -g)` was a tini-era trick to make
container-written files match the host user. Under s6-overlay it no longer
works: the bootstrap (UID remap, volume + build-tree chown, config seeding)
needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui,
node_modules) are owned by the hermes build UID (10000). A pinned arbitrary
UID can't write them, so the runtime fails with EACCES on a bind mount or
hard-crashes on a named volume (Docker inits the volume from the image as
10000; the non-root start can't even `cd /opt/data`, and the profile
reconciler dies with PermissionError on gateway_state.json).

Detect that start early in both the cont-init hook (stage2-hook.sh) and the
CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing
at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID
aliases), which remaps the hermes user and chowns the volume — the same
host-UID-matching outcome --user was used for, without breaking s6.

The guard fires only when the current UID is neither root NOR the hermes UID.
This preserves the supported non-root start from #34648/#34837 (running with
`--user 10000:10000`, i.e. pinned to the hermes UID itself), which is
unaffected — only the arbitrary-UID variant that #34837 never actually made
writable is rejected.

Verified live across five scenarios (built image, bind + named volume):
arbitrary --user on bind -> rejected with guidance, hermes does not run;
arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash;
--user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not
tripped; default root start -> boots. Pre-fix control reproduces the raw
PermissionError + 'can't cd' crash with no guidance.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/docker-user-flag-guidance vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9750 on HEAD, 9748 on base (🆕 +2)

🆕 New issues (2):

Rule Count
unresolved-import 1
no-matching-overload 1
First entries
tests/tools/test_stage2_hook_user_flag_guard.py:31: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_stage2_hook_user_flag_guard.py:84: [no-matching-overload] no-matching-overload: No overload of function `run` matches arguments

✅ Fixed issues: none

Unchanged: 5050 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists labels Jun 4, 2026
@benbarclay benbarclay merged commit 343c54e into main Jun 4, 2026
25 checks passed
@benbarclay benbarclay deleted the fix/docker-user-flag-guidance branch June 4, 2026 00:51
Yuki-14544869 pushed a commit to Yuki-14544869/hermes-agent that referenced this pull request Jun 4, 2026
…ar guidance (NousResearch#38579)

`docker run --user $(id -u):$(id -g)` was a tini-era trick to make
container-written files match the host user. Under s6-overlay it no longer
works: the bootstrap (UID remap, volume + build-tree chown, config seeding)
needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui,
node_modules) are owned by the hermes build UID (10000). A pinned arbitrary
UID can't write them, so the runtime fails with EACCES on a bind mount or
hard-crashes on a named volume (Docker inits the volume from the image as
10000; the non-root start can't even `cd /opt/data`, and the profile
reconciler dies with PermissionError on gateway_state.json).

Detect that start early in both the cont-init hook (stage2-hook.sh) and the
CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing
at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID
aliases), which remaps the hermes user and chowns the volume — the same
host-UID-matching outcome --user was used for, without breaking s6.

The guard fires only when the current UID is neither root NOR the hermes UID.
This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with
`--user 10000:10000`, i.e. pinned to the hermes UID itself), which is
unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made
writable is rejected.

Verified live across five scenarios (built image, bind + named volume):
arbitrary --user on bind -> rejected with guidance, hermes does not run;
arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash;
--user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not
tripped; default root start -> boots. Pre-fix control reproduces the raw
PermissionError + 'can't cd' crash with no guidance.
davidgut1982 pushed a commit to davidgut1982/hermes-agent that referenced this pull request Jun 5, 2026
…ar guidance (NousResearch#38579)

`docker run --user $(id -u):$(id -g)` was a tini-era trick to make
container-written files match the host user. Under s6-overlay it no longer
works: the bootstrap (UID remap, volume + build-tree chown, config seeding)
needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui,
node_modules) are owned by the hermes build UID (10000). A pinned arbitrary
UID can't write them, so the runtime fails with EACCES on a bind mount or
hard-crashes on a named volume (Docker inits the volume from the image as
10000; the non-root start can't even `cd /opt/data`, and the profile
reconciler dies with PermissionError on gateway_state.json).

Detect that start early in both the cont-init hook (stage2-hook.sh) and the
CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing
at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID
aliases), which remaps the hermes user and chowns the volume — the same
host-UID-matching outcome --user was used for, without breaking s6.

The guard fires only when the current UID is neither root NOR the hermes UID.
This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with
`--user 10000:10000`, i.e. pinned to the hermes UID itself), which is
unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made
writable is rejected.

Verified live across five scenarios (built image, bind + named volume):
arbitrary --user on bind -> rejected with guidance, hermes does not run;
arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash;
--user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not
tripped; default root start -> boots. Pre-fix control reproduces the raw
PermissionError + 'can't cd' crash with no guidance.
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…ar guidance (NousResearch#38579)

`docker run --user $(id -u):$(id -g)` was a tini-era trick to make
container-written files match the host user. Under s6-overlay it no longer
works: the bootstrap (UID remap, volume + build-tree chown, config seeding)
needs root, and the baked image dirs (/opt/data, /opt/hermes/.venv, ui-tui,
node_modules) are owned by the hermes build UID (10000). A pinned arbitrary
UID can't write them, so the runtime fails with EACCES on a bind mount or
hard-crashes on a named volume (Docker inits the volume from the image as
10000; the non-root start can't even `cd /opt/data`, and the profile
reconciler dies with PermissionError on gateway_state.json).

Detect that start early in both the cont-init hook (stage2-hook.sh) and the
CMD wrapper (main-wrapper.sh) and fail fast with actionable guidance pointing
at the supported path: root start + HERMES_UID/HERMES_GID (or the PUID/PGID
aliases), which remaps the hermes user and chowns the volume — the same
host-UID-matching outcome --user was used for, without breaking s6.

The guard fires only when the current UID is neither root NOR the hermes UID.
This preserves the supported non-root start from NousResearch#34648/NousResearch#34837 (running with
`--user 10000:10000`, i.e. pinned to the hermes UID itself), which is
unaffected — only the arbitrary-UID variant that NousResearch#34837 never actually made
writable is rejected.

Verified live across five scenarios (built image, bind + named volume):
arbitrary --user on bind -> rejected with guidance, hermes does not run;
arbitrary --user on named volume -> guidance shown, no raw 'can't cd' crash;
--user 10000:10000 -> boots; root + HERMES_UID=4242 remap -> boots, guard not
tripped; default root start -> boots. Pre-fix control reproduces the raw
PermissionError + 'can't cd' crash with no guidance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants