fix(backup): atomically restore cron/jobs.json via tempfile + rename#35594
Open
sprmn24 wants to merge 1 commit into
Open
fix(backup): atomically restore cron/jobs.json via tempfile + rename#35594sprmn24 wants to merge 1 commit into
sprmn24 wants to merge 1 commit into
Conversation
restore_cron_jobs_if_emptied used shutil.copy2 which is non-atomic: a crash mid-copy leaves a partially-written jobs.json, worse than the empty file the function was trying to fix. Replace with mkstemp + write + fsync + os.replace so the live file is either fully restored or untouched. Mirrors the pattern already used by save_jobs in cron/jobs.py (refs NousResearch#30857, NousResearch#30855, NousResearch#32233).
tonydwb
approved these changes
May 31, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved ✅
Review
Replaces a non-atomic shutil.copy2 with the correct atomic pattern (mkstemp → write → fsync → os.replace) for cron job restore, preventing corrupt JSON on crash during restore.
✅ Looks Good
- Correct pattern: Uses the same atomic write already proven in
save_jobs()(cron/jobs.py). - Proper cleanup: Temp file removed on any failure via
try/exceptwithos.unlink. - Minimal change: 15 additions, 1 deletion. Straightforward and well-documented.
- References existing precedent: Same pattern used in #30857, #30855, #32233.
Reviewed by Hermes Agent (cron job)
tonydwb
approved these changes
May 31, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved ✅
Reviewed Changes
- hermes_cli/backup.py:restore_cron_jobs_if_emptied — Replaced
shutil.copy2(snap_path, live_path)with atomic write pattern:tempfile.mkstemp→ write →os.fsync→os.replace.
✅ Looks Good
- Correctness:
shutil.copy2is not atomic — a partial write or concurrent reader could see a corruptedjobs.json. The new pattern usestempfile.mkstemp+os.fsync+os.replace(which is atomic on POSIX) to ensure the target file is either fully the new content or fully the old content. - Error handling: On any
BaseException, the temp file is cleaned up withos.unlink(wrapped in try/except OSError). Good defensive cleanup. - Edge cases:
os.replaceis cross-platform atomic on both Linux and macOS. The exclusive lock onlive_path(viaportalocker) is held during this operation in the calling context. Safe. - No security concerns: No secrets or credential exposure.
- Test coverage: Tested by existing
tests/hermes_cli/test_backup.pysuite which validates restore behavior.
Reviewed by Hermes Agent
tonydwb
approved these changes
May 31, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
Overview
Fixes non-atomic cron job restore in restore_cron_jobs_if_emptied(). Replaces shutil.copy2() (which can leave partially-written corrupt JSON if interrupted mid-copy) with the atomic pattern: mkstemp + write + fsync + os.replace. Temp file is cleaned up on failure.
Looks Good
- Used same atomic pattern as
save_jobs()incron/jobs.py— consistency os.replace()is atomic on the same filesystem (temp file in same directory)- Temp file cleaned up on any exception via
try/except BaseException - Well-documented with clear before/after behavior
- Low blast radius: only affects the automated cron restore path
Reviewed by Hermes Agent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
restore_cron_jobs_if_emptied()inhermes_cli/backup.pyusesshutil.copy2(snap_path, live_path)to restorecron/jobs.jsonwhen ahermes updateempties it.shutil.copy2is not atomic: if the processis killed mid-copy (SIGKILL, OOM, power loss) the live file is left partially
written — corrupt JSON that is worse than the empty file the function was
trying to fix.
This PR replaces the copy with the atomic pattern already used by
save_jobs()in
cron/jobs.py:mkstemp→write+fsync→os.replace. The temp fileis created in the same directory as the target so the final rename is guaranteed
to be on the same filesystem (atomic). On any failure the temp file is removed
and the live file is left untouched.
Related Issue
Fixes #34600 (follow-up hardening — the restore path introduced by #34840
inherited the non-atomic copy)
Type of Change
Changes Made
hermes_cli/backup.py: replaceshutil.copy2(snap_path, live_path)withmkstemp+write+fsync+os.replaceinrestore_cron_jobs_if_emptied()How to Test
hermes cron add ...)hermes updateonce so a pre-update snapshot exists~/.hermes/cron/jobs.jsonmanually (echo '{jobs:[]}' > ...)restore_cron_jobs_if_emptied(snapshot_id)— confirm jobs are restoredtime.sleep()afterfdopenwrite and kill theprocess mid-sleep — confirm no partial
.cron_restore_*.tmpfile remains andjobs.jsonis unchangedChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
Same atomic pattern already merged in this repo:
atomic_json_writemigrations)