fix(cron): restore jobs.json emptied by config migration on update#34602
Closed
Bartok9 wants to merge 1 commit into
Closed
fix(cron): restore jobs.json emptied by config migration on update#34602Bartok9 wants to merge 1 commit into
Bartok9 wants to merge 1 commit into
Conversation
Config-version migrations have been observed to leave cron/jobs.json valid-but-empty after `hermes update`, silently dropping every scheduled job (NousResearch#34600). The existing malformed-shape guards in cron/jobs.py don't catch this because {"jobs": []} is valid JSON. Add restore_cron_jobs_if_emptied() as a post-migration safety net: if the live cron/jobs.json now has zero jobs while the pre-update snapshot held one or more, restore the snapshot copy in place and warn loudly. The check is conservative — it only restores on unambiguous evidence of loss (snapshot had jobs, live file readable-and-empty), so a user who genuinely cleared their jobs is never second-guessed and an unreadable live file is left untouched so real corruption still surfaces. Wired into _cmd_update_impl after migrate_config(), reusing the existing pre-update quick snapshot (which already captures cron/jobs.json). Closes NousResearch#34600
Contributor
|
Merged via #34840. Your snapshot auto-restore safety net was cherry-picked onto current main with your authorship preserved (commit 3845d86). It's bundled with @sweetcornna's disk-cleanup fix (#33834), which addressed the actual root cause: the disk-cleanup plugin was tracking jobs.json as disposable cron-output and deleting it after 14 days. Your net catches any other emptying path. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hermes updateconfig migration can leavecron/jobs.jsonvalid-but-empty, silently dropping every scheduled job.Motivation
Closes #34600.
After a config-version migration (e.g. 23 → 24) during
hermes update,cron/jobs.jsonwas found valid-but-empty — all scheduled jobs gone, no warning, no auto-restore. The existing malformed-shape guards incron/jobs.py(#23002, #20767, #19013) don't catch this case because{"jobs": []}is perfectly valid JSON, just empty. The user only noticed hours later when expected reports stopped arriving.The update flow already takes a
pre-updatequick snapshot that capturescron/jobs.json(see_QUICK_STATE_FILES). This change uses that snapshot as the recovery source.What this does
restore_cron_jobs_if_emptied(snapshot_id)(new, inhermes_cli/backup.py):cron/jobs.jsononly when the live file is readable-and-empty (0 jobs) and the snapshot held ≥1 job.Noneon the healthy path (no noise), or a small result dict on restore so the caller can warn.Wired into
_cmd_update_implright aftermigrate_config(). On restore it prints:Why conservative-by-design
Verification
python3 -m pytest tests/hermes_cli/test_backup.py— 105 passed (6 new inTestRestoreCronJobsIfEmptied).cron/jobs.pyor the migration steps themselves — this is an additive recovery layer, so existing migration behavior is untouched.