feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ by teknium1 · Pull Request #17307 · NousResearch/hermes-agent

teknium1 · 2026-04-29T06:22:21Z

Summary

Every curator pass now writes a dated report directory under ~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/ with a machine-readable run.json and a human-readable REPORT.md. Answers 'what did the curator actually do last night?' without having to dig through agent.log.

Where reports live

~/.hermes/logs/curator/ — alongside agent.log and gateway.log (profile-aware via get_hermes_home()). Not inside ~/.hermes/skills/, which holds user-authored content — reports are operational telemetry and belong with the logs.

What's in each report

run.json (machine-readable):

{
  "started_at": "2026-04-29T06:21:19+00:00",
  "duration_seconds": 42.0,
  "model": "anthropic/claude-opus-4.7",
  "provider": "openrouter",
  "auto_transitions": { "checked", "marked_stale", "archived", "reactivated" },
  "counts": { "before", "after", "delta", "archived_this_run", "added_this_run", "state_transitions", "tool_calls_total" },
  "tool_call_counts": { "skills_list": 1, "skill_manage": 3, "terminal": 7 },
  "archived": [<names>],
  "added": [<names>],
  "state_transitions": [{"name", "from", "to"}],
  "llm_final": "...full untruncated final response...",
  "llm_summary": "240-char cap version",
  "llm_error": null,
  "tool_calls": [{"name", "arguments"}, ...]
}

REPORT.md (human-readable):

Model + provider + duration + before/after skill counts header
Auto-transitions section (pure, no LLM): checked / marked stale / archived / reactivated
LLM consolidation section: tool call counts by name, archived-this-run, new-skills-this-run, state transitions
Full archived list (first 50 inline + run.json pointer for the rest), with restore-command reminder
Full new-skills list (class-level umbrellas the curator created)
State transitions (active → stale, stale → archived, etc.)
Full LLM final summary verbatim (no 240-char truncation)
Recovery footer: hermes curator restore <name> + the archive dir path

Internals

_run_llm_review() now returns a structured dict (final, summary, model, provider, tool_calls, error) instead of a truncated string, so the reporter has full fidelity
Report writer is fully best-effort — any failure is logged at DEBUG and never breaks the curator itself
Same-second rerun collisions get a numeric disambiguator (20260429-062119-2/) instead of clobbering
Report path is stamped into .curator_state as last_report_path so hermes curator status can surface a last report: <path> line
Tool-call arguments truncated at 400 chars in the report (full raw content stays in the LLM session history if needed)

Changes

File	What
`agent/curator.py`	`_reports_root()`, `_write_run_report()`, `_render_report_markdown()`; `_run_llm_review()` refactored to return dict; `_llm_pass()` snapshots state before+after and calls the reporter
`hermes_cli/curator.py`	`hermes curator status` now prints `last report: <path>` when set
`tests/agent/test_curator.py`	Existing stub test updated for new dict return shape
`tests/agent/test_curator_reports.py` (new)	7 tests: report location (logs not skills), both files written, JSON shape + diff accuracy, markdown structure, error path still writes, state transitions captured, same-second uniqueness

Validation

73/73 curator + skill_usage + reports tests pass (7 new + 66 existing)
Live E2E: ran a synchronous pass against a 1-skill temp HERMES_HOME with a stubbed LLM. Report written to logs/curator/20260429-062119/REPORT.md + run.json, state file correctly stamped with last_report_path. Confirmed markdown is human-readable and JSON is machine-parseable end to end.

Sample output

$ hermes curator status
curator: ENABLED
  runs:           3
  last run:       2d ago
  last summary:   auto: no changes; llm: Consolidated nothing; tiny test run.
  last report:    /home/me/.hermes/logs/curator/20260429-062119
  interval:       every 7d
  stale after:    30d unused
  archive after:  90d unused

Retention

No automatic pruning. Each report is ~1-5 KB. A year of weekly runs is <300 KB. Can add curator.keep_last_n_reports later if anyone asks.

…tor/ Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.

…17563) Skill catalog pages (bundled/optional) were drowning out real user-guide and reference docs in search results. There are ~3100 of them and they match on almost every generic term. - Add `ignoreFiles` regexes to docusaurus-search-local for `user-guide/skills/bundled/` and `user-guide/skills/optional/`. The two human-written catalog indexes (`reference/skills-catalog`, `reference/optional-skills-catalog`) remain indexed. - Add a new feature page `user-guide/features/curator.md` covering the curator subsystem merged in #16049 and refined in #17307 (per-run reports): how it runs, config, CLI (`hermes curator status/run/pin/ restore/...`), `.usage.json` telemetry, archival semantics, and recovery. Slotted into the Core features sidebar next to Skills. Search index size dropped from 5822 docs to 2704 in the main section; `user-guide/features/curator` is indexed.

…tor/ (NousResearch#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.

…ousResearch#17563) Skill catalog pages (bundled/optional) were drowning out real user-guide and reference docs in search results. There are ~3100 of them and they match on almost every generic term. - Add `ignoreFiles` regexes to docusaurus-search-local for `user-guide/skills/bundled/` and `user-guide/skills/optional/`. The two human-written catalog indexes (`reference/skills-catalog`, `reference/optional-skills-catalog`) remain indexed. - Add a new feature page `user-guide/features/curator.md` covering the curator subsystem merged in NousResearch#16049 and refined in NousResearch#17307 (per-run reports): how it runs, config, CLI (`hermes curator status/run/pin/ restore/...`), `.usage.json` telemetry, archival semantics, and recovery. Slotted into the Core features sidebar next to Skills. Search index size dropped from 5822 docs to 2704 in the main section; `user-guide/features/curator` is indexed.