Skip to content

feat(curator): background skill maintenance (issue #7816)#17277

Merged
teknium1 merged 6 commits into
mainfrom
hermes/curator-salvage
Apr 29, 2026
Merged

feat(curator): background skill maintenance (issue #7816)#17277
teknium1 merged 6 commits into
mainfrom
hermes/curator-salvage

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Salvages the Curator from #16049 onto current main with a sharper prompt and fixed parent-config inheritance. Closes the loop on issue #7816 paired with the creation-side class-first prompt from #17213.

How it works

Default: enabled, inactivity-triggered (no cron daemon). On CLI startup and gateway boot:

  1. Check config + .curator_state — run if last_run_at is older than interval_hours (default 7 days) AND agent has been idle ≥ min_idle_hours (default 2).
  2. Automatic state transitions (pure, no LLM): mark stale (>30d unused), archive (>90d unused), reactivate stale skills that were used again.
  3. LLM umbrella-building pass — spawns forked AIAgent (inherits user's main provider/model) with the agent-created candidate list + usage stats. The model uses existing tools only: skills_list/skill_view to survey, skill_manage (patch/create/write_file) to consolidate, terminal to archive.
  4. Persist .curator_state with a summary.

Invariants (all load-bearing)

  • Never touches bundled or hub-installed skills. Double-filtered against .bundled_manifest AND .hub/lock.json.
  • Never auto-deletes. Archive only — archives live in ~/.hermes/skills/.archive/ and are recoverable via hermes curator restore <skill>.
  • Pinned skills bypass all auto-transitions. Pinning is user-initiated; the model never pins autonomously.
  • Forked AIAgent inherits parent config. No re-resolution from env vars — works with OAuth-only providers and pool-backed credentials.

CLI surface

hermes curator status             # config + skill state + least-recently-used top-5
hermes curator run [--sync]       # trigger a pass now (sync = block until LLM done)
hermes curator pause / resume
hermes curator pin <skill>        # block auto-transitions on this skill
hermes curator unpin <skill>
hermes curator restore <skill>    # move from .archive/ back to active

Changes vs #16049 (what this salvage adds)

Umbrella-first prompt

The original prompt let the reviewer default to keep whenever two skills weren't byte-identical. Rewritten to push active umbrella-building:

  • Framing: 'UMBRELLA-BUILDING consolidation pass, not a passive audit'
  • Explicit bar: 'would a maintainer write this as N separate skills, or one skill with N labeled subsections?'
  • Three consolidation methods: merge into existing umbrella / create new umbrella SKILL.md / demote to references/ templates/ scripts/ support file
  • Pre-empts observed bailouts: 'usage counters are zero' (judge on content, not use_count) and 'each has a distinct trigger' (pairwise distinctness is the wrong bar)

Config-aware parent fork

_run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. The fork now calls load_config() + resolve_runtime_provider() and passes explicit provider, model, api_key, base_url, api_mode so it runs on whatever the user is currently on.

Unbounded iteration ceiling

max_iterations=8 was way too low — a live umbrella pass takes 50-100 API calls. Raised to 9999; the natural stopping criterion is 'no more clusters worth processing'.

Changes

File What
tools/skill_usage.py (new) Sidecar .usage.json I/O, atomic writes, provenance filter, archive/restore helpers
agent/curator.py (new) Orchestrator: config, idle gating, state-machine transitions, forked-agent LLM pass with umbrella-first prompt
hermes_cli/curator.py (new) hermes curator {status,run,pause,resume,pin,unpin,restore} subcommand
tests/tools/test_skill_usage.py (new) 29 unit tests
tests/agent/test_curator.py (new) 29 unit tests (incl. 2 new umbrella-first prompt contracts)
tools/skills_tool.py Wrap skill_view registry handler to bump view_count on success
tools/skill_manager_tool.py Bump patch_count on patch/edit/write_file/remove_file; forget() record on delete
hermes_cli/config.py Add curator: section to DEFAULT_CONFIG (additive — no version bump needed)
hermes_cli/commands.py Add /curator CommandDef with subcommand hints
hermes_cli/main.py Register hermes curator argparse subparser
gateway/run.py Hook maybe_run_curator into the existing cron ticker thread

Validation — live run on author's own setup

Three successive test runs against 346 agent-created skills on my own ~/.hermes/skills/:

Run Prompt Archives Patches New umbrellas Outcome
1 original (#16049) 3 0 0 'surveyed, no action' — passive
2 consolidation 44 3 0 Found 50-skill MLOps dup bug, umbrella'd nothing
3 umbrella-first 249 3 18 Agent-created skills 346 → 118 (-66%); every archived skill's content preserved as references/ under its umbrella

Run 3 on opus-4.7 via OpenRouter: 86 API calls, ~6.5 min, 99% cache hit rate on later calls, ~$4-7. Pinned skill (hermes-agent-dev) untouched. No deletions — archives recoverable via hermes curator restore.

66/66 curator + skill_usage tests pass. 17/17 review-prompt tests still pass (no regression on #17213).

Attribution

Cherry-picked 5 commits from hermes/curator-infra preserving Teknium's authorship; 1 follow-up commit applies the umbrella-first prompt + config-aware fork improvements.

Paired with creation-side #17213 (class-first review prompt), this closes issue #7816: the creation side prefers patching/references over new narrow skills, the curator side does umbrella-building over time.

teknium1 and others added 6 commits April 28, 2026 22:03
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.

Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).

Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
  .hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
  via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache

New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
  provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
  transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
  pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests

Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
  patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
  register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)

Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end

Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.

Refs #7816.
The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill`
tools that are not registered as model tools. Swap the prompt to rely
on the real surface:

  - skill_manage action=patch  — for patching and consolidation
  - terminal                   — to `mv` skill dirs into .archive/

Also drop `pin` from the model's decision list — pinning is a user
opt-out for `hermes curator pin <skill>`, not something the model
should do autonomously.

Decision list is now: keep / patch / consolidate / archive.

Tests updated: prompt-invariant test now asserts the existing tools
are referenced and that bespoke tool names do NOT appear. New test
prevents `pin` from being re-added as a model decision.
Previous invariants only gated the primary entry points
(apply_automatic_transitions, archive_skill, CLI pin). Several paths
were unprotected:

  - bump_view / bump_use / bump_patch / set_state / set_pinned wrote
    usage records unconditionally, which is confusing noise in
    .usage.json even though the review list filtered them out
  - restore_skill did not check whether a bundled skill now shadows
    the archived name
  - CLI unpin was asymmetric with CLI pin — it had no gate

Fixes:
  - _mutate() (the shared counter / state writer) now drops silently
    when the skill is not agent-created. .usage.json never gains a
    record for a bundled or hub-installed skill.
  - restore_skill() refuses to restore under a name that is now
    bundled or hub-installed (would shadow upstream).
  - CLI unpin gate matches CLI pin.

New tests:
  - 5 provenance-guard tests on skill_usage (one per mutator)
  - 1 end-to-end test that hammers every mutator at a bundled skill
    and a hub skill, asserts both are untouched on disk, and asserts
    the sidecar stays clean
  - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically

64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).
Weekly is closer to how skill churn actually works — most agent-created
skills don't change multiple times per day, so a daily review is pure
cost without benefit. Bumping the default to 7 days reduces aux-model
spend while still catching drift and staleness on the timescales that
matter (30d stale, 90d archive).

Changes:
- DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days)
- config.yaml default: interval_hours: 24 -> 24 * 7
- CLI status line renders as '7d' when interval is a whole-day multiple
- Test `test_old_run_eligible` decoupled from the exact default: it now
  uses 2 * get_interval_hours() so future tweaks don't break it
Long-running gateways need the curator to fire on cadence without
restarts. Piggy-back on the existing cron ticker thread (which already
runs image/document cache cleanup every hour on the same pattern)
instead of spawning a dedicated timer thread.

- New CURATOR_EVERY = 60 ticks (poll hourly at default 60s interval).
  The inner config.interval_hours gate controls the real cadence, so
  60 of these 60 hourly pokes are cheap no-ops and one runs the review.
- Removed the boot-time call added in the prior commit — the ticker
  covers boot + every hour thereafter. Avoids double-running.

Handles the weekly-default-on-24/7-gateway gap flagged in review.
…d iterations

Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:

**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.

**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).

**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).

**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.

**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.

**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.

**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
Comment thread gateway/run.py Dismissed
Comment thread hermes_cli/curator.py
print("curator: running review pass...")

def _on_summary(msg: str) -> None:
print(msg)
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard tool/skills Skills system (list, view, manage) labels Apr 29, 2026
@teknium1 teknium1 merged commit fa9383d into main Apr 29, 2026
10 of 12 checks passed
@teknium1 teknium1 deleted the hermes/curator-salvage branch April 29, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have tool/skills Skills system (list, view, manage) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants