feat(curator): background skill maintenance (issue #7816) by teknium1 · Pull Request #17277 · NousResearch/hermes-agent

teknium1 · 2026-04-29T05:07:47Z

Summary

Salvages the Curator from #16049 onto current main with a sharper prompt and fixed parent-config inheritance. Closes the loop on issue #7816 paired with the creation-side class-first prompt from #17213.

How it works

Default: enabled, inactivity-triggered (no cron daemon). On CLI startup and gateway boot:

Check config + .curator_state — run if last_run_at is older than interval_hours (default 7 days) AND agent has been idle ≥ min_idle_hours (default 2).
Automatic state transitions (pure, no LLM): mark stale (>30d unused), archive (>90d unused), reactivate stale skills that were used again.
LLM umbrella-building pass — spawns forked AIAgent (inherits user's main provider/model) with the agent-created candidate list + usage stats. The model uses existing tools only: skills_list/skill_view to survey, skill_manage (patch/create/write_file) to consolidate, terminal to archive.
Persist .curator_state with a summary.

Invariants (all load-bearing)

Never touches bundled or hub-installed skills. Double-filtered against .bundled_manifest AND .hub/lock.json.
Never auto-deletes. Archive only — archives live in ~/.hermes/skills/.archive/ and are recoverable via hermes curator restore <skill>.
Pinned skills bypass all auto-transitions. Pinning is user-initiated; the model never pins autonomously.
Forked AIAgent inherits parent config. No re-resolution from env vars — works with OAuth-only providers and pool-backed credentials.

CLI surface

hermes curator status             # config + skill state + least-recently-used top-5
hermes curator run [--sync]       # trigger a pass now (sync = block until LLM done)
hermes curator pause / resume
hermes curator pin <skill>        # block auto-transitions on this skill
hermes curator unpin <skill>
hermes curator restore <skill>    # move from .archive/ back to active

Changes vs #16049 (what this salvage adds)

Umbrella-first prompt

The original prompt let the reviewer default to keep whenever two skills weren't byte-identical. Rewritten to push active umbrella-building:

Framing: 'UMBRELLA-BUILDING consolidation pass, not a passive audit'
Explicit bar: 'would a maintainer write this as N separate skills, or one skill with N labeled subsections?'
Three consolidation methods: merge into existing umbrella / create new umbrella SKILL.md / demote to references/ templates/ scripts/ support file
Pre-empts observed bailouts: 'usage counters are zero' (judge on content, not use_count) and 'each has a distinct trigger' (pairwise distinctness is the wrong bar)

Config-aware parent fork

_run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. The fork now calls load_config() + resolve_runtime_provider() and passes explicit provider, model, api_key, base_url, api_mode so it runs on whatever the user is currently on.

Unbounded iteration ceiling

max_iterations=8 was way too low — a live umbrella pass takes 50-100 API calls. Raised to 9999; the natural stopping criterion is 'no more clusters worth processing'.

Changes

File	What
`tools/skill_usage.py` (new)	Sidecar `.usage.json` I/O, atomic writes, provenance filter, archive/restore helpers
`agent/curator.py` (new)	Orchestrator: config, idle gating, state-machine transitions, forked-agent LLM pass with umbrella-first prompt
`hermes_cli/curator.py` (new)	`hermes curator {status,run,pause,resume,pin,unpin,restore}` subcommand
`tests/tools/test_skill_usage.py` (new)	29 unit tests
`tests/agent/test_curator.py` (new)	29 unit tests (incl. 2 new umbrella-first prompt contracts)
`tools/skills_tool.py`	Wrap `skill_view` registry handler to bump `view_count` on success
`tools/skill_manager_tool.py`	Bump `patch_count` on patch/edit/write_file/remove_file; `forget()` record on delete
`hermes_cli/config.py`	Add `curator:` section to `DEFAULT_CONFIG` (additive — no version bump needed)
`hermes_cli/commands.py`	Add `/curator` `CommandDef` with subcommand hints
`hermes_cli/main.py`	Register `hermes curator` argparse subparser
`gateway/run.py`	Hook `maybe_run_curator` into the existing cron ticker thread

Validation — live run on author's own setup

Three successive test runs against 346 agent-created skills on my own ~/.hermes/skills/:

Run	Prompt	Archives	Patches	New umbrellas	Outcome
1	original (#16049)	3	0	0	'surveyed, no action' — passive
2	consolidation	44	3	0	Found 50-skill MLOps dup bug, umbrella'd nothing
3	umbrella-first	249	3	18	Agent-created skills 346 → 118 (-66%); every archived skill's content preserved as references/ under its umbrella

Run 3 on opus-4.7 via OpenRouter: 86 API calls, ~6.5 min, 99% cache hit rate on later calls, ~$4-7. Pinned skill (hermes-agent-dev) untouched. No deletions — archives recoverable via hermes curator restore.

66/66 curator + skill_usage tests pass. 17/17 review-prompt tests still pass (no regression on #17213).

Attribution

Cherry-picked 5 commits from hermes/curator-infra preserving Teknium's authorship; 1 follow-up commit applies the umbrella-first prompt + config-aware fork improvements.

Paired with creation-side #17213 (class-first review prompt), this closes issue #7816: the creation side prefers patching/references over new narrow skills, the curator side does umbrella-building over time.

Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.

The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill` tools that are not registered as model tools. Swap the prompt to rely on the real surface: - skill_manage action=patch — for patching and consolidation - terminal — to `mv` skill dirs into .archive/ Also drop `pin` from the model's decision list — pinning is a user opt-out for `hermes curator pin <skill>`, not something the model should do autonomously. Decision list is now: keep / patch / consolidate / archive. Tests updated: prompt-invariant test now asserts the existing tools are referenced and that bespoke tool names do NOT appear. New test prevents `pin` from being re-added as a model decision.

Previous invariants only gated the primary entry points (apply_automatic_transitions, archive_skill, CLI pin). Several paths were unprotected: - bump_view / bump_use / bump_patch / set_state / set_pinned wrote usage records unconditionally, which is confusing noise in .usage.json even though the review list filtered them out - restore_skill did not check whether a bundled skill now shadows the archived name - CLI unpin was asymmetric with CLI pin — it had no gate Fixes: - _mutate() (the shared counter / state writer) now drops silently when the skill is not agent-created. .usage.json never gains a record for a bundled or hub-installed skill. - restore_skill() refuses to restore under a name that is now bundled or hub-installed (would shadow upstream). - CLI unpin gate matches CLI pin. New tests: - 5 provenance-guard tests on skill_usage (one per mutator) - 1 end-to-end test that hammers every mutator at a bundled skill and a hub skill, asserts both are untouched on disk, and asserts the sidecar stays clean - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically 64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).

Weekly is closer to how skill churn actually works — most agent-created skills don't change multiple times per day, so a daily review is pure cost without benefit. Bumping the default to 7 days reduces aux-model spend while still catching drift and staleness on the timescales that matter (30d stale, 90d archive). Changes: - DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days) - config.yaml default: interval_hours: 24 -> 24 * 7 - CLI status line renders as '7d' when interval is a whole-day multiple - Test `test_old_run_eligible` decoupled from the exact default: it now uses 2 * get_interval_hours() so future tweaks don't break it

Long-running gateways need the curator to fire on cadence without restarts. Piggy-back on the existing cron ticker thread (which already runs image/document cache cleanup every hour on the same pattern) instead of spawning a dedicated timer thread. - New CURATOR_EVERY = 60 ticks (poll hourly at default 60s interval). The inner config.interval_hours gate controls the real cadence, so 60 of these 60 hourly pokes are cheap no-ops and one runs the review. - Removed the boot-time call added in the prior commit — the ticker covers boot + every hour thereafter. Avoids double-running. Handles the weekly-default-on-24/7-gateway gap flagged in review.

…d iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.

+    print("curator: running review pass...")
+
+    def _on_summary(msg: str) -> None:
+        print(msg)


teknium1 and others added 6 commits April 28, 2026 22:03

github-advanced-security AI found potential problems Apr 29, 2026

View reviewed changes

Comment thread gateway/run.py Dismissed

Comment thread hermes_cli/curator.py

print("curator: running review pass...")

def _on_summary(msg: str) -> None:

print(msg)

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard tool/skills Skills system (list, view, manage) labels Apr 29, 2026

teknium1 merged commit fa9383d into main Apr 29, 2026
10 of 12 checks passed

teknium1 deleted the hermes/curator-salvage branch April 29, 2026 05:33

This was referenced Apr 29, 2026

feat(curator): background skill maintenance (issue #7816) #16049

Closed

[Feature]: Skill lifecycle management — usage metadata, staleness, archival, and revalidation #7816

Closed

github-actions Bot mentioned this pull request May 1, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.23 to v2026.4.30 Docker-Hub-sirmark/docker-hermes-agent#4

Merged

subinium mentioned this pull request May 11, 2026

feat(curator): autonomous skill curator — background grading + consolidation + pruning (Hermes v0.12 parity) subinium/CrowClaw#311

Open

6 tasks

fancpp mentioned this pull request May 14, 2026

Self-created skills lack mechanism-level guarantees for correctness and execution consistency #25833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(curator): background skill maintenance (issue #7816)#17277

feat(curator): background skill maintenance (issue #7816)#17277
teknium1 merged 6 commits into
mainfrom
hermes/curator-salvage

teknium1 commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

teknium1 commented Apr 29, 2026

Summary

How it works

Invariants (all load-bearing)

CLI surface

Changes vs #16049 (what this salvage adds)

Umbrella-first prompt

Config-aware parent fork

Unbounded iteration ceiling

Changes

Validation — live run on author's own setup

Attribution

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants