Skip to content

fix: use atomic writes for persistent JSON state files#27913

Open
vanthinh6886 wants to merge 1 commit into
NousResearch:mainfrom
vanthinh6886:fix/non-atomic-writes-skills-hub
Open

fix: use atomic writes for persistent JSON state files#27913
vanthinh6886 wants to merge 1 commit into
NousResearch:mainfrom
vanthinh6886:fix/non-atomic-writes-skills-hub

Conversation

@vanthinh6886

Copy link
Copy Markdown
Contributor

Problem

Several files write persistent JSON state (installed skills registry, taps config, sticker cache, environment store) using write_text() directly. If the process crashes or loses power mid-write, the file is left in a partially-written state — corrupted JSON that causes JSONDecodeError on next read.

Fix

Replace write_text() with the existing atomic_json_write() helper from utils.py, which uses:

  1. tempfile.mkstemp() — write to a temp file
  2. f.flush() + os.fsync() — ensure data hits disk
  3. os.replace() — atomically swap temp file into place

This is the same pattern already used by gateway/status.py, gateway/pairing.py, and cron/jobs.py.

Files Changed

File What was at risk
tools/skills_hub.py SkillLockFile.save() — installed skills registry
tools/skills_hub.py TapsFile.save() — skill taps configuration
gateway/sticker_cache.py _save_cache() — sticker description cache
tools/environments/base.py _save_json_store() — generic env backend store

Before vs After

Scenario Before After
Crash mid-write Corrupted JSON, JSONDecodeError on next read Previous version intact, no data loss
Normal write Works Works (temp + atomic replace)

Tests

Existing tests cover the happy path. Atomic write correctness is already validated by the atomic_json_write() implementation and its existing test coverage in tests/test_utils.py.

Replace direct write_text() calls with the existing atomic_json_write()
helper (utils.py) which uses temp file + fsync + os.replace to prevent
corruption on crash or power loss.

Files fixed:
- tools/skills_hub.py: SkillLockFile.save() and TapsFile.save()
  (installed skills registry and taps config)
- gateway/sticker_cache.py: _save_cache()
  (sticker description cache)
- tools/environments/base.py: _save_json_store()
  (generic JSON store used by environment backends)
@alt-glitch alt-glitch added type/bug Something isn't working comp/tools Tool registry, model_tools, toolsets comp/gateway Gateway runner, session dispatch, delivery tool/skills Skills system (list, view, manage) duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have labels May 18, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #20532 — same files (skills_hub.py, environments/base.py) and same fix (replace write_text() with atomic_json_write()). #20532 is a superset covering 6 call sites. Also overlaps with #16440 (skills_hub only).

@BoardJames-Bot

Copy link
Copy Markdown

Board James triage pass: this PR's check-attribution failure is the same missing-author-map issue as #27909/#27910: vanthinh6886@gmail.com (vanthinh6886) is not in scripts/release.py AUTHOR_MAP. I opened #27931 with the central fix ("vanthinh6886@gmail.com": "vanthinh6886"). The test job did not expose a branch-local assertion failure in logs; it was cancelled at ~95% progress. After #27931 lands, please rebase/rerun checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants