Skip to content

fix(docker): chown hermes-owned top-level state files on boot (salvage #35098)#36236

Merged
benbarclay merged 1 commit into
mainfrom
fix/chown-root-owned-top-level-files-35098
Jun 1, 2026
Merged

fix(docker): chown hermes-owned top-level state files on boot (salvage #35098)#36236
benbarclay merged 1 commit into
mainfrom
fix/chown-root-owned-top-level-files-35098

Conversation

@benbarclay

Copy link
Copy Markdown
Collaborator

Summary

Salvage of #35098 (@x1am1) — fixes the same gateway-restart-loop bug, with a targeted allowlist instead of a blanket find -user root sweep.

When docker exec <container> hermes … runs as root (the default unless -u is passed) and creates or rewrites a top-level state file under $HERMES_HOME — e.g. auth.json, state.db, gateway.lock, gateway_state.json — the file lands root-owned. The unprivileged hermes runtime then hits PermissionError on next startup, producing a gateway restart loop. The existing targeted chown in stage2-hook.sh only covers hermes-owned subdirectories, so loose top-level files are missed.

Why a salvage instead of merging #35098 directly

#35098's fix used find "$HERMES_HOME" -maxdepth 1 -user root -type f -exec chown hermes:hermes. That works for the reported case, but it walks back the deliberate targeted-ownership contract established in #19788 / PR #19795: a bind-mounted $HERMES_HOME may contain host-owned files Hermes does not manage, and those must never be chowned. A blanket -user root sweep would silently reassign any root-owned host file (a root-created secret, a mount artifact, etc.) to the hermes UID.

This PR keeps x1am1's intent but uses an explicit allowlist of hermes-owned top-level files (mirroring the file entries of hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock files), consistent with how the existing subdir chown uses a curated list.

Credit preserved via Co-authored-by: x1am1 and an AUTHOR_MAP entry so x1am1 is attributed in release notes.

Verification

  • New regression test (tests/tools/test_stage2_hook_toplevel_chown.py): asserts the allowlist covers the reported-broken files, that it skips absent/non-allowlisted files, and that no find -user root blanket sweep is reintroduced.
  • Existing test_stage2_hook_puid_pgid.py still green.
  • End-to-end against a real build: started a container, used docker exec -u root to create root-owned auth.json/state.db/gateway.lock and a non-allowlisted host_secret.json, restarted, and confirmed the allowlisted files were reset to hermes while host_secret.json kept its root ownership.

Closes #35098.

The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.

Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.

This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.

Co-authored-by: x1am1 <2663402852@qq.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/chown-root-owned-top-level-files-35098 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9572 on HEAD, 9570 on base (🆕 +2)

🆕 New issues (2):

Rule Count
unresolved-import 1
no-matching-overload 1
First entries
tests/tools/test_stage2_hook_toplevel_chown.py:29: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_stage2_hook_toplevel_chown.py:109: [no-matching-overload] no-matching-overload: No overload of function `run` matches arguments

✅ Fixed issues: none

Unchanged: 4959 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists labels Jun 1, 2026
@benbarclay benbarclay merged commit bdceedf into main Jun 1, 2026
23 of 25 checks passed
@benbarclay benbarclay deleted the fix/chown-root-owned-top-level-files-35098 branch June 1, 2026 04:38
JoeKowal pushed a commit to JoeKowal/hermes-agent that referenced this pull request Jun 4, 2026
…search#35098) (NousResearch#36236)

The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.

Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.

This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from NousResearch#19788 / PR NousResearch#19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.

Co-authored-by: x1am1 <2663402852@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging backend/docker Docker container execution P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants