fix(docker): chown hermes-owned top-level state files on boot (salvage #35098)#36236
Merged
Merged
Conversation
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned *subdirectories*; loose state files living directly under $HERMES_HOME (auth.json, state.db, gateway.lock, gateway_state.json, …) are missed. When created or rewritten by `docker exec <container> hermes …` (root unless `-u` is passed) they land root-owned, and the unprivileged hermes runtime then hits PermissionError on next startup, producing a gateway restart loop. Fix: reset ownership of an explicit allowlist of hermes-owned top-level files on every boot. The list mirrors the top-level file entries of hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock files. This uses a targeted allowlist rather than the originally-proposed blanket `find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the targeted-ownership contract from #19788 / PR #19795: a bind-mounted $HERMES_HOME may contain host-owned files Hermes does not manage, and those must never be chowned. Verified end-to-end: allowlisted root-owned files are reset to hermes on restart while a non-allowlisted host file keeps its root ownership. Co-authored-by: x1am1 <2663402852@qq.com>
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
1 |
no-matching-overload |
1 |
First entries
tests/tools/test_stage2_hook_toplevel_chown.py:29: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_stage2_hook_toplevel_chown.py:109: [no-matching-overload] no-matching-overload: No overload of function `run` matches arguments
✅ Fixed issues: none
Unchanged: 4959 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
JoeKowal
pushed a commit
to JoeKowal/hermes-agent
that referenced
this pull request
Jun 4, 2026
…search#35098) (NousResearch#36236) The targeted data-volume chown in stage2-hook.sh only covers hermes-owned *subdirectories*; loose state files living directly under $HERMES_HOME (auth.json, state.db, gateway.lock, gateway_state.json, …) are missed. When created or rewritten by `docker exec <container> hermes …` (root unless `-u` is passed) they land root-owned, and the unprivileged hermes runtime then hits PermissionError on next startup, producing a gateway restart loop. Fix: reset ownership of an explicit allowlist of hermes-owned top-level files on every boot. The list mirrors the top-level file entries of hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock files. This uses a targeted allowlist rather than the originally-proposed blanket `find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the targeted-ownership contract from NousResearch#19788 / PR NousResearch#19795: a bind-mounted $HERMES_HOME may contain host-owned files Hermes does not manage, and those must never be chowned. Verified end-to-end: allowlisted root-owned files are reset to hermes on restart while a non-allowlisted host file keeps its root ownership. Co-authored-by: x1am1 <2663402852@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvage of #35098 (@x1am1) — fixes the same gateway-restart-loop bug, with a targeted allowlist instead of a blanket
find -user rootsweep.When
docker exec <container> hermes …runs as root (the default unless-uis passed) and creates or rewrites a top-level state file under$HERMES_HOME— e.g.auth.json,state.db,gateway.lock,gateway_state.json— the file lands root-owned. The unprivilegedhermesruntime then hitsPermissionErroron next startup, producing a gateway restart loop. The existing targeted chown instage2-hook.shonly covers hermes-owned subdirectories, so loose top-level files are missed.Why a salvage instead of merging #35098 directly
#35098's fix used
find "$HERMES_HOME" -maxdepth 1 -user root -type f -exec chown hermes:hermes. That works for the reported case, but it walks back the deliberate targeted-ownership contract established in #19788 / PR #19795: a bind-mounted$HERMES_HOMEmay contain host-owned files Hermes does not manage, and those must never be chowned. A blanket-user rootsweep would silently reassign any root-owned host file (a root-created secret, a mount artifact, etc.) to the hermes UID.This PR keeps x1am1's intent but uses an explicit allowlist of hermes-owned top-level files (mirroring the file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDEplus the runtime lock files), consistent with how the existing subdir chown uses a curated list.Credit preserved via
Co-authored-by: x1am1and anAUTHOR_MAPentry so x1am1 is attributed in release notes.Verification
tests/tools/test_stage2_hook_toplevel_chown.py): asserts the allowlist covers the reported-broken files, that it skips absent/non-allowlisted files, and that nofind -user rootblanket sweep is reintroduced.test_stage2_hook_puid_pgid.pystill green.docker exec -u rootto create root-ownedauth.json/state.db/gateway.lockand a non-allowlistedhost_secret.json, restarted, and confirmed the allowlisted files were reset tohermeswhilehost_secret.jsonkept its root ownership.Closes #35098.