Skip to content

fix(docker): chown gateway after UID remap#37928

Closed
sarvesh1327 wants to merge 1 commit into
NousResearch:mainfrom
sarvesh1327:fix/stage2-gateway-uid-remap-chown
Closed

fix(docker): chown gateway after UID remap#37928
sarvesh1327 wants to merge 1 commit into
NousResearch:mainfrom
sarvesh1327:fix/stage2-gateway-uid-remap-chown

Conversation

@sarvesh1327

Copy link
Copy Markdown
Contributor

Summary

  • Track whether the Docker stage2 hook remapped the hermes UID/GID and run targeted ownership repair in that case, even when $HERMES_HOME already matches the new UID.
  • Include /opt/hermes/gateway in the runtime-writable install trees chowned by both the Dockerfile and stage2 hook so gateway Python cache/runtime artifacts stay writable after UID remap.
  • Add regression tests for the UID/GID remap ownership condition and gateway tree allowlist.

Fixes #27221.

Test Plan

  • python3 -m pytest -o addopts='' tests/tools/test_stage2_hook_install_dir_chown.py tests/tools/test_dockerfile_node_modules_perms.py tests/tools/test_stage2_hook_toplevel_chown.py tests/tools/test_stage2_hook_puid_pgid.py -q
  • python3 -m ruff check tests/tools/test_stage2_hook_install_dir_chown.py tests/tools/test_dockerfile_node_modules_perms.py
  • sh -n docker/stage2-hook.sh
  • python3 -m compileall -q tests/tools/test_stage2_hook_install_dir_chown.py tests/tools/test_dockerfile_node_modules_perms.py

@alt-glitch alt-glitch added type/bug Something isn't working area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists labels Jun 3, 2026
benbarclay added a commit that referenced this pull request Jun 4, 2026
Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes #27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
benbarclay added a commit that referenced this pull request Jun 4, 2026
…#38655)

Salvage of #37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from #37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's #38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes #27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
@benbarclay

Copy link
Copy Markdown
Collaborator

Thanks for this, @sarvesh1327 — the still-needed part of your fix landed via #38655 (you're preserved as Co-authored-by). Fixes #27221.

I salvaged the /opt/hermes/gateway chown half: main already chowns .venv, ui-tui, and node_modules on a UID remap but was missing gateway, so after a remap (e.g. Unraid 99) the supervised gateway hit EACCES writing __pycache__ there. #38655 adds gateway to both chown sites (Dockerfile build-time + stage2 runtime repair).

I dropped the uid_gid_remapped flag and the || [ "$uid_gid_remapped" = true ] gate, because main already solved that half via #38556 — and a bit more robustly: it probes the actual tree ownership (venv_owner != actual_hermes_uid) rather than tracking same-boot remaps, so it also catches pre-existing ownership drift and stays idempotent. Keeping the flag would have regressed that, so the salvage is the gateway-tree addition only.

Verified end-to-end against a real image build: on baseline main a remap to UID 99 left gateway owned by 10000 and a write as uid 99 failed EACCES; with the change gateway is chowned to 99:100 and the write succeeds, while the default-uid (no-remap) path is unchanged.

Closing as incorporated and extended by #38655. 🙏

@benbarclay benbarclay closed this Jun 4, 2026
Yuki-14544869 pushed a commit to Yuki-14544869/hermes-agent that referenced this pull request Jun 4, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
davidgut1982 pushed a commit to davidgut1982/hermes-agent that referenced this pull request Jun 5, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…earch#37928) (NousResearch#38655)

Salvage of NousResearch#37928 (@sarvesh1327), reduced to the still-needed delta.

`/opt/hermes/gateway` is a runtime-writable Python package: on first import
the supervised gateway writes `__pycache__` beneath it, and the image does
not set PYTHONDONTWRITEBYTECODE. When HERMES_UID/PUID is remapped at boot
(e.g. Unraid 99), `usermod -u` only re-chowns the hermes home dir; the build
trees under /opt/hermes keep the build-time UID (10000). main already chowns
`.venv`, `ui-tui`, and `node_modules` on remap (NousResearch#38556) but missed `gateway`,
so the remapped gateway hits EACCES writing `__pycache__` (NousResearch#27221).

Add `/opt/hermes/gateway` to both chown sites — the Dockerfile build-time
`chown -R hermes:hermes` line and the stage2-hook build-tree repair — so it
tracks the remapped UID like the sibling trees.

Differs from NousResearch#37928 as submitted: dropped the `uid_gid_remapped` flag and the
`|| [ "$uid_gid_remapped" = true ]` chown gate. main's NousResearch#38556 already solved
that half, and more correctly — it probes the actual tree ownership
(`venv_owner != actual_hermes_uid`) rather than tracking same-boot remaps,
which also catches pre-existing ownership drift and stays idempotent. Keeping
NousResearch#37928's flag would regress that. The salvage is the `gateway`-tree addition
only.

Verified end-to-end against a real image build: on baseline main a remap to
UID 99 leaves `gateway` owned by 10000 and a write as uid 99 fails EACCES;
with this change `gateway` is chowned to 99:100 and the write succeeds, while
the default-uid (no-remap) path is unchanged.

Fixes NousResearch#27221.

Co-authored-by: Sarvesh <sarveshagl1327@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: entrypoint.sh misses chown for ui-tui/ and gateway/ when HERMES_UID is remapped

3 participants