fix: use atomic writes for persistent state files to prevent corruption#20532
Open
vominh1919 wants to merge 1 commit into
Open
fix: use atomic writes for persistent state files to prevent corruption#20532vominh1919 wants to merge 1 commit into
vominh1919 wants to merge 1 commit into
Conversation
Non-atomic write_text() calls can leave files in a partially-written state if the process is killed mid-write. This causes JSONDecodeError on next read (for JSON stores) or truncated output (for text files). Fixed 6 locations: - tools/environments/base.py: _save_json_store() — JSON state for execution environments. Corruption breaks all subsequent reads. - tools/skills_hub.py: InstalledSkillsRegistry.save() — installed skills metadata. Corruption loses all installed-skill tracking. - tools/skills_hub.py: TapsRegistry.save() — taps data. - tools/skills_hub.py: index cache write — skills hub index cache. - gateway/delivery.py: _save_summary_output() — cron delivery output. - gateway/delivery.py: _save_full_output() — full cron output. For JSON writes, reuses the existing atomic_json_write() from utils.py (which already handles temp file + fsync + os.replace + symlink resolution). For text writes in delivery.py, added a local _atomic_write_text() helper with the same pattern. The codebase already uses atomic writes in several places: - run_agent.py: atomic_json_write() for session logs - gateway/run.py: tmp + replace for response/pending files - tools/mcp_oauth.py: tmp + rename for OAuth tokens This commit brings the remaining persistent state writes to parity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Non-atomic
write_text()calls can leave files in a partially-written state if the process is killed mid-write (OOM kill, power loss, SIGKILL). This causes:json.loads()raisesJSONDecodeErroron next read, breaking all functionality that depends on the storeThe codebase already uses atomic writes in several places (
run_agent.pyviaatomic_json_write(),gateway/run.pyvia tmp+replace,tools/mcp_oauth.pyvia tmp+rename). This PR brings the remaining persistent state writes to parity.Fix
Replace
write_text()with atomic write patterns for 6 locations across 3 files.Files changed:
tools/environments/base.py_save_json_store()tools/skills_hub.pyInstalledSkillsRegistry.save()tools/skills_hub.pyTapsRegistry.save()tools/skills_hub.pygateway/delivery.py_save_summary_output()gateway/delivery.py_save_full_output()Implementation:
atomic_json_write()fromutils.py(handles temp file + fsync +os.replace+ symlink resolution)_atomic_write_text()helper with the same pattern (tempfile + fsync +os.replace)Before vs After
Tests
The fix is defensive-only — it cannot break existing behavior since atomic writes produce identical output to non-atomic writes under normal operation. The difference only matters under abnormal termination, which is difficult to test in unit tests.
References
utils.py:85—atomic_json_write()implementationrun_agent.py:4377— existing atomic write usage for session logsgateway/run.py:4788— existing tmp+replace pattern for response filestools/mcp_oauth.py:165— existing tmp+rename pattern for OAuth tokens