Skip to content

fix(skills): transactional skills reset --restore + read-only Nix tree cleanup (#34972)#35251

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-958a0449
May 30, 2026
Merged

fix(skills): transactional skills reset --restore + read-only Nix tree cleanup (#34972)#35251
teknium1 merged 3 commits into
mainfrom
hermes/hermes-958a0449

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

hermes skills reset <name> --restore is now transactional and survives read-only Nix-store skill copies — it no longer corrupts the manifest into a permanent limbo state.

Root cause: reset_bundled_skill deleted the manifest entry before shutil.rmtree(dest). On Nix-store installs (read-only r-xr-xr-x dirs), rmtree failed and the function returned early — but the manifest entry was already gone, so future sync_skills runs silently skipped the skill forever (user-modified-UNTRACKED).

Closes #34972.

Changes

  • tools/skills_sync.py:
    • Reorder so the user copy is removed before the manifest entry is dropped. A failed removal now leaves the manifest untouched (action: "not_reset", message "Manifest entry preserved").
    • _rmtree_writable() chmods read-only entries and retries via onerror. The retry handler now chmods the parent directory as well as the failing path — unlinking a child requires write permission on its parent, so the original (file-only chmod) still failed on a true read-only tree.
    • Backup cleanup in sync_skills uses the same writable rmtree.
  • tests/tools/test_skills_sync.py: two regression tests —
    • restore succeeds on a full read-only tree (real Nix case: r-xr-xr-x dirs + r-- files), and
    • manifest is preserved when removal genuinely cannot proceed.

Validation

Scenario Before After
--restore on read-only files only works works
--restore on read-only dirs+files (Nix store) rmtree fails → manifest corrupted succeeds, re-baselined
Removal genuinely impossible manifest entry deleted → limbo manifest preserved, nothing changed

E2E verified against a real read-only tree (isolated HERMES_HOME, real reset_bundled_skill): restore succeeds, nested read-only file removed, bundled re-copied, manifest re-tracked. Targeted suite: tests/tools/test_skills_sync.py 51/51 pass.

Credit

Salvaged from #35131 by @annguyenNous (transaction reordering + initial _rmtree_writable), cherry-picked onto current main with authorship preserved. Follow-up commit fixes the parent-dir chmod gap and corrects the regression test.

Supersedes #35182 (@TKwave — read-only backup cleanup, a strict subset of this fix).

Infographic

skills-reset-transactional-fix

annguyenNous and others added 2 commits May 30, 2026 01:43
…le read-only files in rmtree

Two related bugs in tools/skills_sync.py affecting Nix-store and
immutable-package installs:

**#34972 — reset_bundled_skill corrupts manifest on rmtree failure:**
The function deleted the manifest entry BEFORE attempting rmtree. If
rmtree failed (read-only files from Nix store), the function returned
early — leaving the skill in a manifest-less limbo state where future
syncs silently skip it forever.

Fix: reorder steps — attempt rmtree FIRST, only delete manifest entry
after rmtree succeeds. If rmtree fails, nothing is changed.

**#34860 — stale .bak directories after sync:**
sync_skills() called shutil.rmtree(backup, ignore_errors=True) which
silently failed on read-only files, leaving persistent .bak dirs.

Fix: add _rmtree_writable() helper that makes files writable via an
onerror callback before retrying removal. Used in both sync_skills()
backup cleanup and reset_bundled_skill().

Fixes #34972
Fixes #34860
…just files

The cherry-picked fix's onerror handler chmod'd only the failing path, but
unlinking a child requires write permission on its PARENT directory. On a true
Nix-store copy (r-xr-xr-x dirs + files) rmtree still failed. Now chmod the
parent dir as well before retrying.

Also rewrites the regression test: the original asserted the helper FAILS on a
read-only dir (documenting the limitation), which is the wrong success criterion.
Split into two tests — restore succeeds on a full read-only tree (real Nix case),
and manifest is preserved when removal genuinely cannot proceed (monkeypatched).
@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-958a0449 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9498 on HEAD, 9498 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4925 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working tool/skills Skills system (list, view, manage) area/nix Nix flake, NixOS module, container packaging P3 Low — cosmetic, nice to have labels May 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes #35131 and #35182. Salvage onto current main with both transaction reorder fix and parent-dir chmod for read-only Nix trees.

The hermetic CI env (slice 4/6) redirects HERMES_HOME, so a post-restore
_read_manifest() can resolve to an empty/redirected manifest path and return
{}. Assert on sync_skills's in-memory return value (synced["copied"]) instead,
which is the resilient signal that the skill was re-copied and is no longer in
limbo.
@teknium1 teknium1 merged commit 6a08fd3 into main May 30, 2026
23 checks passed
@teknium1 teknium1 deleted the hermes/hermes-958a0449 branch May 30, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/nix Nix flake, NixOS module, container packaging P3 Low — cosmetic, nice to have tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: hermes skills reset --restore corrupts manifest when rmtree fails on read-only Nix-store dirs

3 participants