Title: Managed/shared Hermes runtime: atomic writes recreate SKILL.md and .bundled_manifest as 0600, causing permission-denied failures
Summary
In a managed/shared-runtime deployment, Hermes can recreate files inside HERMES_HOME with owner-private permissions (0600) even when the deployment expects group-shared access. This shows up most clearly for:
- ~/.hermes/skills/**/SKILL.md written via skill_manage
- ~/.hermes/skills/.bundled_manifest written via skills sync
On a system where the gateway runs as one user and interactive sessions may touch the same HERMES_HOME via another user in the same group, this leads to intermittent permission-denied failures when Hermes later scans or loads skills.
This is not just old drift: files are actively recreated with 0600 after being normalized.
Observed symptoms
- Repeated permission-denied warnings when loading skills, e.g.:
- Failed to parse skill file .../skills/smart-home/homeassistant-on-this-box/SKILL.md: [Errno 13] Permission denied
- .bundled_manifest reappears as 0600 after normalization
- SKILL.md files created/edited through Hermes can end up 0600
Environment
- NixOS managed deployment
- Shared HERMES_HOME under /var/lib/hermes/.hermes
- Gateway/service runs as user hermes
- Interactive SSH sessions may run Hermes as a different user but point at the same HERMES_HOME
- Group-sharing expected via hermes:hermes ownership and group-readable/group-writable runtime files
Why this seems upstream, not just local policy
The local deployment shape is specific, but the file-creation bug is general:
- atomic write helpers use tempfile.mkstemp(...)
- mkstemp creates temp files with 0600
- os.replace() preserves the temp file mode
- result: target files silently collapse to 0600 unless chmod is explicitly restored after replace
Affected code paths
- tools/skill_manager_tool.py
Current atomic writer:
268|def _atomic_write_text(file_path: Path, content: str, encoding: str = "utf-8") -> None:
281| file_path.parent.mkdir(parents=True, exist_ok=True)
282| fd, temp_path = tempfile.mkstemp(
283| dir=str(file_path.parent),
284| prefix=f".{file_path.name}.tmp.",
285| suffix="",
286| )
287| try:
288| with os.fdopen(fd, "w", encoding=encoding) as f:
289| f.write(content)
290| os.replace(temp_path, file_path)
This path is used for SKILL.md writes and edits.
- tools/skills_sync.py
Current manifest writer:
79|def _write_manifest(entries: Dict[str, str]):
91| fd, tmp_path = tempfile.mkstemp(
92| dir=str(MANIFEST_FILE.parent),
93| prefix=".bundled_manifest_",
94| suffix=".tmp",
95| )
96| try:
97| with os.fdopen(fd, "w", encoding="utf-8") as f:
98| f.write(data)
99| f.flush()
100| os.fsync(f.fileno())
101| os.replace(tmp_path, MANIFEST_FILE)
This recreates .bundled_manifest as 0600.
Relevant context in config code
The config layer already acknowledges that managed installs want different permissions:
222|def _secure_dir(path):
225| Skipped in managed mode — the NixOS module sets group-readable
226| permissions (0750) so interactive users in the hermes group can
227| share state with the gateway service.
...
273|def _secure_file(path):
276| Skipped in managed mode — the NixOS activation script sets
277| group-readable permissions (0640) on config files.
282| if is_managed() or _is_container():
283| return
So Hermes already has the concept of managed/shared runtime semantics. The atomic-write paths just do not preserve those semantics.
Reproduction sketch
- Use a shared HERMES_HOME with group-based access (e.g. service user + interactive user in same group).
- In managed/shared mode, normalize skill files and manifest to group-readable/writable (e.g. 0660 or 0640 depending policy).
- Trigger one of:
- create/edit/patch a local skill via skill_manage
- run bundled skill sync that updates .bundled_manifest
- Observe that the rewritten file becomes 0600.
- A different process/user sharing the same runtime later fails to read it.
Expected behavior
In managed/shared installations, Hermes should preserve the deployment’s shared-runtime permission model after atomic writes. Rewritten runtime files should not silently fall back to owner-private 0600 unless that is explicitly the intended mode for that file.
Actual behavior
Atomic replacement recreates files with mkstemp’s default 0600 mode.
Suggested fix directions
Option A: make the atomic write helpers preserve target mode if the target already exists
- stat existing file before replace
- chmod the temp file (or final file) to match the existing mode
Option B: make atomic writes managed-aware
- if is_managed():
- use a managed/shared file mode policy (for example 0660 or 0640 depending file class)
- apply chmod after os.replace
Option C: both
- preserve existing mode when present
- otherwise use a managed/shared default when in managed mode
At minimum, the following should probably stop being recreated as 0600 in managed mode:
- skills/**/SKILL.md
- skills/.bundled_manifest
- similar runtime metadata written through atomic temp-file replacement
Why this matters
This breaks a valid deployment model Hermes already partially supports:
- managed runtime
- group-shared state
- service user + interactive/operator access
Even if that deployment is not the default, Hermes already has managed-mode permission branches, so preserving file modes during atomic writes seems like the right invariant.
Local workaround used here
A local NixOS activation step was added to re-normalize:
- /var/lib/hermes/.hermes/skills/**/SKILL.md -> 0660
- /var/lib/hermes/.hermes/skills/.bundled_manifest -> 0660
- runtime dirs -> 2770
That mitigates drift, but it is policy cleanup after the fact, not a source-level fix.
Potential follow-up
If useful, I can turn this into a PR by patching:
- tools/skill_manager_tool.py::_atomic_write_text
- tools/skills_sync.py::_write_manifest
so they preserve existing mode and/or apply managed-mode-safe permissions after replace.
Title: Managed/shared Hermes runtime: atomic writes recreate SKILL.md and .bundled_manifest as 0600, causing permission-denied failures
Summary
In a managed/shared-runtime deployment, Hermes can recreate files inside HERMES_HOME with owner-private permissions (0600) even when the deployment expects group-shared access. This shows up most clearly for:
On a system where the gateway runs as one user and interactive sessions may touch the same HERMES_HOME via another user in the same group, this leads to intermittent permission-denied failures when Hermes later scans or loads skills.
This is not just old drift: files are actively recreated with 0600 after being normalized.
Observed symptoms
Environment
Why this seems upstream, not just local policy
The local deployment shape is specific, but the file-creation bug is general:
Affected code paths
Current atomic writer:
This path is used for SKILL.md writes and edits.
Current manifest writer:
This recreates .bundled_manifest as 0600.
Relevant context in config code
The config layer already acknowledges that managed installs want different permissions:
So Hermes already has the concept of managed/shared runtime semantics. The atomic-write paths just do not preserve those semantics.
Reproduction sketch
Expected behavior
In managed/shared installations, Hermes should preserve the deployment’s shared-runtime permission model after atomic writes. Rewritten runtime files should not silently fall back to owner-private 0600 unless that is explicitly the intended mode for that file.
Actual behavior
Atomic replacement recreates files with mkstemp’s default 0600 mode.
Suggested fix directions
Option A: make the atomic write helpers preserve target mode if the target already exists
Option B: make atomic writes managed-aware
Option C: both
At minimum, the following should probably stop being recreated as 0600 in managed mode:
Why this matters
This breaks a valid deployment model Hermes already partially supports:
Even if that deployment is not the default, Hermes already has managed-mode permission branches, so preserving file modes during atomic writes seems like the right invariant.
Local workaround used here
A local NixOS activation step was added to re-normalize:
That mitigates drift, but it is policy cleanup after the fact, not a source-level fix.
Potential follow-up
If useful, I can turn this into a PR by patching:
so they preserve existing mode and/or apply managed-mode-safe permissions after replace.