hermes doctor doesn't probe cron scheduler health — silent failures when gateway down or jobs error

## Context

Auditing the gateway/cron/batch_runner boundaries for diagnostic visibility gaps analogous to the devagentic-graph probe shipped in #17/#18. Found one: `hermes doctor` does not surface cron scheduler health.

Operators wire `hermes cron create …`, expect autonomous execution, and have no signal from `hermes doctor` when:

1. **The gateway isn't running.** Cron jobs only fire when the gateway process is up (`cron_status` checks this via `find_gateway_pids()`). If the gateway crashed or was never installed, jobs sit dormant.
2. **Recent jobs are failing.** Every job stores `last_status` + `last_error` in `~/.hermes/cron/jobs.json`. A handful of `error` states is the symptom of a broken environment (missing API key, expired auth, schedule misconfigured). Currently invisible to doctor.

`hermes cron status` shows both of these — but only when the operator knows to run it. Doctor is the canonical "tell me what's wrong" surface and should flag them.

## Fix

Add `_check_cron_scheduler()` to `hermes_cli/doctor.py`. Inert when `~/.hermes/cron/jobs.json` doesn't exist (the byte-stable default for users who never wired cron). When jobs are configured:

```
─── Cron Scheduler ───
  ✓ Gateway running          PID 12345
  ✓ 3 active job(s)          next run: 2026-05-22T23:55:00

# or, if anything's broken:

─── Cron Scheduler ───
  ✗ Gateway not running       cron jobs will NOT fire — see `hermes cron status`
  ⚠ 2 of last 10 runs failed
    - job-a (2026-05-22T22:33): timeout
    - job-b (2026-05-22T22:01): authentication failed
```

Re-uses primitives that already exist: `cron.jobs.load_jobs`, `hermes_cli.gateway.find_gateway_pids`. Imports are wrapped to fail silently if either path is missing — same defensive pattern the existing `_check_devagentic_graph` uses.

## Out of scope

- Healing failed jobs. Doctor's job is diagnostic, not remediation.
- Probing the scheduler tick cadence. Surface is `hermes cron tick` + the gateway's internal scheduler — doctor would need to read scheduler state which is more invasive than the read-only "show me what's broken" contract.

Filed by hermes-maintainer (PowerCreek). PR incoming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hermes doctor doesn't probe cron scheduler health — silent failures when gateway down or jobs error #26

Context

Fix

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

hermes doctor doesn't probe cron scheduler health — silent failures when gateway down or jobs error #26

Description

Context

Fix

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions