Skip to content

disk-cleanup misclassifies cron/jobs.json as cron-output and can delete the live cron registry #32164

@claw-io

Description

@claw-io

Summary

The bundled disk-cleanup plugin can delete the durable cron registry because it classifies all top-level HERMES_HOME/cron/** paths as disposable cron-output.

That includes ~/.hermes/cron/jobs.json, which is the scheduler's source-of-truth job store.

Once jobs.json is auto-tracked as cron-output, the plugin's automatic cleanup can delete it, and Hermes then treats the missing registry as an empty schedule (0 jobs).

Confirmed root cause

Current plugins/disk-cleanup/disk_cleanup.py::guess_category() logic:

if top == "cron" or top == "cronjobs":
    return "cron-output"

This is too broad. cron/output/** contains disposable run artifacts, but top-level cron state does not.

Durable scheduler state in the same directory includes at least:

  • ~/.hermes/cron/jobs.json
  • ~/.hermes/cron/.tick.lock

Why this is destructive

disk-cleanup later deletes tracked cron-output entries during automatic cleanup. If jobs.json has been tracked under that category, the cleanup pass can remove the live cron registry.

After that, Hermes behaves as if there are no scheduled jobs because missing ~/.hermes/cron/jobs.json is interpreted as an empty job list.

Reproduction

  1. Enable the bundled disk-cleanup plugin.
  2. Ensure a cron registry exists at ~/.hermes/cron/jobs.json.
  3. Cause the plugin to auto-track a path inside top-level ~/.hermes/cron/ via the existing guess_category() path classification.
  4. Run the plugin's automatic or manual quick cleanup.
  5. Observe that the cron registry may be deleted and subsequent cron listing shows 0 jobs.

Expected behavior

Only disposable run artifacts under ~/.hermes/cron/output/** should be classified as cron-output.

Top-level cron control-plane files must never be auto-tracked as cleanup candidates.

Actual behavior

Top-level cron files are classified as cron-output, making the scheduler registry eligible for deletion.

Proposed fix

Restrict cron-output classification to the output subtree only, e.g.:

if top == "cron" or top == "cronjobs":
    if len(rel.parts) >= 2 and rel.parts[1] == "output":
        return "cron-output"
    return None

Regression coverage suggested

Add tests that assert:

  • ~/.hermes/cron/output/<job>/run.md -> cron-output
  • ~/.hermes/cron/jobs.json -> None
  • ~/.hermes/cron/.tick.lock -> None

Notes

This is distinct from other cron-loss issues involving:

  • profile-fragmented cron stores
  • concurrent jobs.json write races
  • permission/mode problems on jobs.json
  • dashboard update flows

Those can also produce missing or invisible jobs, but this bug is specifically about disk-cleanup deleting the registry due to overly broad cron-path classification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/cronCron scheduler and job managementcomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions