Skip to content

fix(docker): chown build trees on UID remap independently of $HERMES_HOME (#35027 regression)#38556

Merged
benbarclay merged 1 commit into
mainfrom
fix/docker-build-tree-chown-remap
Jun 4, 2026
Merged

fix(docker): chown build trees on UID remap independently of $HERMES_HOME (#35027 regression)#38556
benbarclay merged 1 commit into
mainfrom
fix/docker-build-tree-chown-remap

Conversation

@benbarclay

Copy link
Copy Markdown
Collaborator

Problem

After a HERMES_UID/PUID/HERMES_GID/PGID remap, the build trees under
/opt/hermes (.venv, ui-tui, node_modules) are left owned by the
build-time UID (10000) instead of the remapped runtime UID. As the remapped
hermes user this is read-only, which breaks:

This is the default supported path for NAS users (Synology / unRAID / UGOS)
running -e PUID=$(id -u) -e PGID=$(id -g) per the docs.

Root cause — regression from #35027

The stage2 hook re-chowns those build trees when the UID is remapped. #35027
("skip unnecessary boot chown when volume ownership already matches remapped
UID") folded that chown under the $HERMES_HOME ownership gate:

needs_chown=false
if [ "$(stat -c %u "$HERMES_HOME")" != "$actual_hermes_uid" ]; then needs_chown=true; fi
if [ "$needs_chown" = true ]; then
    chown ... "$HERMES_HOME" subdirs ...
    chown -R hermes:hermes "$INSTALL_DIR/.venv" "$INSTALL_DIR/ui-tui" "$INSTALL_DIR/node_modules"
fi

But usermod -u <new> hermes re-chowns the hermes home dir
($HERMES_HOME == /opt/data) to the new UID as a side effect. So after
any remap, stat $HERMES_HOME already matches actual_hermes_uid,
needs_chown stays false, and the build-tree chown is silently skipped.
The build trees live under /opt/hermes, not $HERMES_HOME, so gating them
on $HERMES_HOME ownership is simply the wrong key.

Fix

Gate the build-tree chown independently, probing the venv owner directly:

venv_owner=$(stat -c %u "$INSTALL_DIR/.venv" 2>/dev/null || echo "")
if [ -n "$venv_owner" ] && [ "$venv_owner" != "$actual_hermes_uid" ]; then
    chown -R hermes:hermes "$INSTALL_DIR/.venv" "$INSTALL_DIR/ui-tui" "$INSTALL_DIR/node_modules"
fi

Independent of $HERMES_HOME, idempotent across restarts (skips the recursive
chown once ownership is settled).

Verification

Unit tests (tests/tools/test_stage2_hook_build_tree_chown.py, 4 tests)

Extracts the block and runs it with stat/chown stubbed — asserts it fires
in the remap scenario (venv_owner=10000, hermes_uid=4242) and is skipped
once already-owned, and that the chown is not keyed on $HERMES_HOME.

Live E2E (built the image, fresh named volume, HERMES_UID=HERMES_GID=4242)

A harness boots the image, asserts ownership of all three build trees, and
runs a real uv pip install into the venv as the remapped hermes user, then
restarts and asserts idempotency.

pre-fix image post-fix image
build-tree chown fires ✗ (skipped)
.venv / ui-tui / node_modules owner 10000 4242
uv pip install into venv ✗ EACCES ✓ OK
chown fires exactly once (idempotent on restart) ✗ (0) ✓ (1)
result 0/6 6/6

Scope

docker/ lane only — docker/stage2-hook.sh + a new contract test. No
runtime/Python behavior change.

Notes

This fixes the HERMES_UID/PUID remap path. A separate issue —
docker run --user $(id -u):$(id -g) (which bypasses the root-start
usermod/chown machinery entirely, and hard-crashes on a named volume) — is
tracked separately and not addressed here.

…HOME (#35027 regression)

The stage2 hook gates the recursive chown of the build trees under
$INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap
leaves them writable by the new runtime UID — needed for lazy_deps
'uv pip install' of platform extras (#15012, #21100) and the TUI esbuild
rebuild into ui-tui/dist (#28851).

#35027 folded that chown under the $HERMES_HOME ownership check
('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes'
re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID
as a side effect, so after any remap that stat is already satisfied and
needs_chown is false — silently skipping the build-tree chown on the
common PUID/NAS path. The venv stays owned by the build-time UID (10000),
so lazy installs and TUI rebuilds fail with EACCES.

Probe the build trees directly instead: chown only when /opt/hermes/.venv
is not already owned by the runtime hermes UID. Independent of
$HERMES_HOME ownership, idempotent across restarts.

Verified live: built the image, booted with HERMES_UID/HERMES_GID on a
fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by
the remapped UID and 'uv pip install' into the venv succeeds; confirmed
the recursive chown fires once and is skipped on restart.
@benbarclay benbarclay merged commit 5446153 into main Jun 4, 2026
25 checks passed
@benbarclay benbarclay deleted the fix/docker-build-tree-chown-remap branch June 4, 2026 00:17
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists area/docker Docker image, Compose, packaging backend/docker Docker container execution labels Jun 4, 2026
benbarclay added a commit that referenced this pull request Jun 4, 2026
Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes #27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
benbarclay added a commit that referenced this pull request Jun 4, 2026
…#38655)

Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes #27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
Yuki-14544869 pushed a commit to Yuki-14544869/hermes-agent that referenced this pull request Jun 4, 2026
…HOME (NousResearch#35027 regression) (NousResearch#38556)

The stage2 hook gates the recursive chown of the build trees under
$INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap
leaves them writable by the new runtime UID — needed for lazy_deps
'uv pip install' of platform extras (NousResearch#15012, NousResearch#21100) and the TUI esbuild
rebuild into ui-tui/dist (NousResearch#28851).

NousResearch#35027 folded that chown under the $HERMES_HOME ownership check
('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes'
re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID
as a side effect, so after any remap that stat is already satisfied and
needs_chown is false — silently skipping the build-tree chown on the
common PUID/NAS path. The venv stays owned by the build-time UID (10000),
so lazy installs and TUI rebuilds fail with EACCES.

Probe the build trees directly instead: chown only when /opt/hermes/.venv
is not already owned by the runtime hermes UID. Independent of
$HERMES_HOME ownership, idempotent across restarts.

Verified live: built the image, booted with HERMES_UID/HERMES_GID on a
fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by
the remapped UID and 'uv pip install' into the venv succeeds; confirmed
the recursive chown fires once and is skipped on restart.
Yuki-14544869 pushed a commit to Yuki-14544869/hermes-agent that referenced this pull request Jun 4, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
j2h4u added a commit to j2h4u/hermes-agent that referenced this pull request Jun 4, 2026
Brings deploy up to upstream/main (468 commits, 2026-05-31 → 06-04).

All three carried docker patches are now superseded by upstream and dropped
— merge result is byte-identical to upstream/main:

- 5031c9e (chown install trees independently of volume ownership)
  → superseded by upstream's "Fix ownership of build trees under $INSTALL_DIR"
    block in stage2-hook.sh (NousResearch#38655/NousResearch#38556), a strict superset that also
    covers the gateway tree and documents the NousResearch#35027 gating regression.

- e1fc281 (avoid implicit chown of host hermes home)
- 810534f (silence install stamp on protected mounts)
  → superseded by the s6-overlay boot migration (feat(docker)! e0e9c89),
    which turned entrypoint.sh into a deprecated shim and moved all chown /
    setup logic into cont-init.d/01-hermes-setup (stage2-hook.sh). Upstream's
    targeted HERMES_HOME chown now chowns the dir itself, not bind-mount
    contents — exactly the behavior our patch protected.

deploy now carries no local diff vs upstream/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
davidgut1982 pushed a commit to davidgut1982/hermes-agent that referenced this pull request Jun 5, 2026
…HOME (NousResearch#35027 regression) (NousResearch#38556)

The stage2 hook gates the recursive chown of the build trees under
$INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap
leaves them writable by the new runtime UID — needed for lazy_deps
'uv pip install' of platform extras (NousResearch#15012, NousResearch#21100) and the TUI esbuild
rebuild into ui-tui/dist (NousResearch#28851).

NousResearch#35027 folded that chown under the $HERMES_HOME ownership check
('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes'
re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID
as a side effect, so after any remap that stat is already satisfied and
needs_chown is false — silently skipping the build-tree chown on the
common PUID/NAS path. The venv stays owned by the build-time UID (10000),
so lazy installs and TUI rebuilds fail with EACCES.

Probe the build trees directly instead: chown only when /opt/hermes/.venv
is not already owned by the runtime hermes UID. Independent of
$HERMES_HOME ownership, idempotent across restarts.

Verified live: built the image, booted with HERMES_UID/HERMES_GID on a
fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by
the remapped UID and 'uv pip install' into the venv succeeds; confirmed
the recursive chown fires once and is skipped on restart.
davidgut1982 pushed a commit to davidgut1982/hermes-agent that referenced this pull request Jun 5, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…HOME (NousResearch#35027 regression) (NousResearch#38556)

The stage2 hook gates the recursive chown of the build trees under
$INSTALL_DIR (.venv, ui-tui, node_modules) so a HERMES_UID/PUID remap
leaves them writable by the new runtime UID — needed for lazy_deps
'uv pip install' of platform extras (NousResearch#15012, NousResearch#21100) and the TUI esbuild
rebuild into ui-tui/dist (NousResearch#28851).

NousResearch#35027 folded that chown under the $HERMES_HOME ownership check
('stat $HERMES_HOME != hermes_uid'). But 'usermod -u <new> hermes'
re-chowns the hermes home dir ($HERMES_HOME == /opt/data) to the new UID
as a side effect, so after any remap that stat is already satisfied and
needs_chown is false — silently skipping the build-tree chown on the
common PUID/NAS path. The venv stays owned by the build-time UID (10000),
so lazy installs and TUI rebuilds fail with EACCES.

Probe the build trees directly instead: chown only when /opt/hermes/.venv
is not already owned by the runtime hermes UID. Independent of
$HERMES_HOME ownership, idempotent across restarts.

Verified live: built the image, booted with HERMES_UID/HERMES_GID on a
fresh named volume, confirmed .venv/ui-tui/node_modules end up owned by
the remapped UID and 'uv pip install' into the venv succeeds; confirmed
the recursive chown fires once and is skipped on restart.
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants