Skip to content

fix(cron): restore jobs.json emptied by config migration on update#34602

Closed
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/cron-jobs-migration-loss-34600
Closed

fix(cron): restore jobs.json emptied by config migration on update#34602
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/cron-jobs-migration-loss-34600

Conversation

@Bartok9

@Bartok9 Bartok9 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • hermes update config migration can leave cron/jobs.json valid-but-empty, silently dropping every scheduled job.
  • This adds a post-migration safety net that auto-restores the jobs from the pre-update snapshot and warns the user.

Motivation

Closes #34600.

After a config-version migration (e.g. 23 → 24) during hermes update, cron/jobs.json was found valid-but-empty — all scheduled jobs gone, no warning, no auto-restore. The existing malformed-shape guards in cron/jobs.py (#23002, #20767, #19013) don't catch this case because {"jobs": []} is perfectly valid JSON, just empty. The user only noticed hours later when expected reports stopped arriving.

The update flow already takes a pre-update quick snapshot that captures cron/jobs.json (see _QUICK_STATE_FILES). This change uses that snapshot as the recovery source.

What this does

restore_cron_jobs_if_emptied(snapshot_id) (new, in hermes_cli/backup.py):

  • Compares the current cron job count against the pre-update snapshot.
  • Restores the snapshot's cron/jobs.json only when the live file is readable-and-empty (0 jobs) and the snapshot held ≥1 job.
  • Returns None on the healthy path (no noise), or a small result dict on restore so the caller can warn.

Wired into _cmd_update_impl right after migrate_config(). On restore it prints:

  ⚠️  cron/jobs.json was emptied during this update — restored N job(s) from pre-update snapshot <id>.

Why conservative-by-design

  • A user who genuinely cleared all their jobs is never second-guessed (snapshot-had-jobs + live-is-empty is the only trigger).
  • An unreadable/corrupt live file (count unknown) is left untouched, so real corruption still surfaces rather than being silently overwritten.
  • Never raises — a safety-net failure can't break an otherwise-good update.

Verification

  • python3 -m pytest tests/hermes_cli/test_backup.py105 passed (6 new in TestRestoreCronJobsIfEmptied).
  • New tests cover: restore-on-empty, no-op when live still has jobs, no-op when snapshot had none, no-op on unreadable live file, no-op on missing snapshot id, and legacy bare-list snapshot shape.
  • Did not change cron/jobs.py or the migration steps themselves — this is an additive recovery layer, so existing migration behavior is untouched.

Config-version migrations have been observed to leave cron/jobs.json
valid-but-empty after `hermes update`, silently dropping every scheduled
job (NousResearch#34600). The existing malformed-shape guards in cron/jobs.py don't
catch this because {"jobs": []} is valid JSON.

Add restore_cron_jobs_if_emptied() as a post-migration safety net: if the
live cron/jobs.json now has zero jobs while the pre-update snapshot held
one or more, restore the snapshot copy in place and warn loudly. The
check is conservative — it only restores on unambiguous evidence of loss
(snapshot had jobs, live file readable-and-empty), so a user who genuinely
cleared their jobs is never second-guessed and an unreadable live file is
left untouched so real corruption still surfaces.

Wired into _cmd_update_impl after migrate_config(), reusing the existing
pre-update quick snapshot (which already captures cron/jobs.json).

Closes NousResearch#34600
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cron Cron scheduler and job management comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles labels May 29, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #34840. Your snapshot auto-restore safety net was cherry-picked onto current main with your authorship preserved (commit 3845d86). It's bundled with @sweetcornna's disk-cleanup fix (#33834), which addressed the actual root cause: the disk-cleanup plugin was tracking jobs.json as disposable cron-output and deleting it after 14 days. Your net catches any other emptying path. Thanks!
#34840

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(cron): config migration (23 to 24) may silently clear cron/jobs.json

3 participants