Skip to content

fix(skills-hub): use atomic writes for lock.json, taps.json, and index cache#29699

Open
annguyenNous wants to merge 1 commit into
NousResearch:mainfrom
annguyenNous:fix/skills-hub-atomic-writes
Open

fix(skills-hub): use atomic writes for lock.json, taps.json, and index cache#29699
annguyenNous wants to merge 1 commit into
NousResearch:mainfrom
annguyenNous:fix/skills-hub-atomic-writes

Conversation

@annguyenNous

Copy link
Copy Markdown
Contributor

Problem

Three file write sites in tools/skills_hub.py use direct write_text() which truncates the file before writing new content. If the process is killed mid-write (SIGKILL, OOM, power loss), the file is left empty or partially written, corrupting the data.

Affected files:

  1. HubLockFile.save() (line 2775) — writes lock.json which coordinates concurrent skill installs across multiple processes. Corruption here can break all subsequent skill operations.
  2. HubTapsFile.save() (line 2842) — writes taps.json (skill registry sources). Corruption loses all configured taps.
  3. Index cache write (line 3129) — writes the skills index cache. Corruption is recoverable (cache miss) but causes unnecessary re-fetching.

Fix

Replace all three write_text(json.dumps(...)) calls with atomic_json_write() from utils.py, which already implements the correct pattern: tempfile.mkstemp() + fsync() + os.replace().

Diff: +4 lines, -4 lines (1 file)

Before vs After

Scenario Before After
Process killed mid-write File corrupted (empty/partial) Previous version intact
Concurrent writes Race condition, lost updates Atomic replace, last writer wins
Disk full during write Partial file left on disk Temp file cleaned up, original preserved

Testing

  • Verified syntax: ast.parse() passes
  • atomic_json_write() is already used extensively in the codebase (cron/jobs.py, agent/memory_manager.py, etc.)
  • No behavior change for normal operation — same JSON output, same indent=2

@alt-glitch alt-glitch added type/bug Something isn't working tool/skills Skills system (list, view, manage) comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have labels May 21, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #16440 (same scope: atomic writes for lock.json and taps.json in skills_hub.py). Also a subset of #29019 which covers these files plus additional json.loads guards and atomic writes across the codebase.

HubLockFile.save() and TapsFile.save() used Path.write_text() which
can leave corrupt JSON if the process is interrupted mid-write (crash,
disk full, SIGKILL). On next startup the JSON parse fails and the
fallback returns empty data — effectively forgetting all hub-installed
skills and configured taps.

Use tempfile.mkstemp + fsync + os.replace for atomic writes, matching
the pattern already used by skill_usage.py and utils.py:atomic_json_write.

Fixes corruption risk for hub skill state and tap configuration.
@annguyenNous annguyenNous force-pushed the fix/skills-hub-atomic-writes branch from b0886c6 to ea7100f Compare May 29, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants