Context
Auditing the gateway/cron/batch_runner boundaries for diagnostic visibility gaps analogous to the devagentic-graph probe shipped in #17/#18. Found one: hermes doctor does not surface cron scheduler health.
Operators wire hermes cron create …, expect autonomous execution, and have no signal from hermes doctor when:
- The gateway isn't running. Cron jobs only fire when the gateway process is up (
cron_status checks this via find_gateway_pids()). If the gateway crashed or was never installed, jobs sit dormant.
- Recent jobs are failing. Every job stores
last_status + last_error in ~/.hermes/cron/jobs.json. A handful of error states is the symptom of a broken environment (missing API key, expired auth, schedule misconfigured). Currently invisible to doctor.
hermes cron status shows both of these — but only when the operator knows to run it. Doctor is the canonical "tell me what's wrong" surface and should flag them.
Fix
Add _check_cron_scheduler() to hermes_cli/doctor.py. Inert when ~/.hermes/cron/jobs.json doesn't exist (the byte-stable default for users who never wired cron). When jobs are configured:
─── Cron Scheduler ───
✓ Gateway running PID 12345
✓ 3 active job(s) next run: 2026-05-22T23:55:00
# or, if anything's broken:
─── Cron Scheduler ───
✗ Gateway not running cron jobs will NOT fire — see `hermes cron status`
⚠ 2 of last 10 runs failed
- job-a (2026-05-22T22:33): timeout
- job-b (2026-05-22T22:01): authentication failed
Re-uses primitives that already exist: cron.jobs.load_jobs, hermes_cli.gateway.find_gateway_pids. Imports are wrapped to fail silently if either path is missing — same defensive pattern the existing _check_devagentic_graph uses.
Out of scope
- Healing failed jobs. Doctor's job is diagnostic, not remediation.
- Probing the scheduler tick cadence. Surface is
hermes cron tick + the gateway's internal scheduler — doctor would need to read scheduler state which is more invasive than the read-only "show me what's broken" contract.
Filed by hermes-maintainer (PowerCreek). PR incoming.
Context
Auditing the gateway/cron/batch_runner boundaries for diagnostic visibility gaps analogous to the devagentic-graph probe shipped in #17/#18. Found one:
hermes doctordoes not surface cron scheduler health.Operators wire
hermes cron create …, expect autonomous execution, and have no signal fromhermes doctorwhen:cron_statuschecks this viafind_gateway_pids()). If the gateway crashed or was never installed, jobs sit dormant.last_status+last_errorin~/.hermes/cron/jobs.json. A handful oferrorstates is the symptom of a broken environment (missing API key, expired auth, schedule misconfigured). Currently invisible to doctor.hermes cron statusshows both of these — but only when the operator knows to run it. Doctor is the canonical "tell me what's wrong" surface and should flag them.Fix
Add
_check_cron_scheduler()tohermes_cli/doctor.py. Inert when~/.hermes/cron/jobs.jsondoesn't exist (the byte-stable default for users who never wired cron). When jobs are configured:Re-uses primitives that already exist:
cron.jobs.load_jobs,hermes_cli.gateway.find_gateway_pids. Imports are wrapped to fail silently if either path is missing — same defensive pattern the existing_check_devagentic_graphuses.Out of scope
hermes cron tick+ the gateway's internal scheduler — doctor would need to read scheduler state which is more invasive than the read-only "show me what's broken" contract.Filed by hermes-maintainer (PowerCreek). PR incoming.