Feature: Skill Lifecycle Quality — Better Descriptions, Proactive Improvement Loop, and Writing Principles (inspired by Anthropic skill-creator)

## Overview

Anthropic's [skill-creator](https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md) meta-skill reveals several practical, low-effort improvements to how Hermes Agent creates, triggers, and iteratively refines skills. Unlike our existing #337 (evolutionary self-improvement via automated pipelines) or #416 (structural validation), this issue focuses on **day-to-day skill quality** — making the agent write better skills, trigger them more reliably, and improve them during normal use.

The core insight from Anthropic's approach: *the skill lifecycle should be a closed loop* — create → use → observe gaps → refine → use again. Hermes has the primitives for this (skill_manage with patch, system prompt encouragement) but several tweaks would make the loop significantly tighter.

**Source:** [anthropics/skills/skill-creator](https://github.com/anthropics/skills/tree/main/skills/skill-creator) — specifically the SKILL.md methodology, `improve_description.py`, and `run_loop.py` for description optimization.

---

## Research Findings

### What Anthropic Does Well

**1. "Pushy" Description Philosophy**

Claude undertriggers skills by default — the model errs on the side of NOT loading a skill even when it's relevant. Anthropic's fix: descriptions should be slightly aggressive, explicitly listing edge cases and synonyms:

> "Make sure to use this skill whenever the user mentions dashboards, reports, analytics, data viz, charts, or visualizations — even if they don't explicitly ask for a 'dashboard'."

**2. Description Length Budget**

Anthropic allows up to **1,024 characters** for descriptions and recommends 100–200 words. The description is the only thing the model sees when deciding whether to trigger a skill — it needs room to convey trigger conditions, not just a sentence fragment.

**3. Imperative + Explain Why**

Skills should use imperative commands but explain the reasoning. From their guide:

> "Today's LLMs are smart. They have good theory of mind... If you find yourself writing ALWAYS or NEVER in all caps, that's a yellow flag — reframe and explain the reasoning."

**4. Anti-Overfitting Guidance**

> "Don't make instructions too narrow to the test cases; aim for generalizability. Use metaphors and general patterns."

**5. "Bundle Repeated Work"**

If multiple uses of a skill result in the agent writing the same Python script, that script should be moved into the skill's `scripts/` folder. This is a practical iterative refinement pattern.

**6. Progressive Disclosure Awareness**

Keep SKILL.md under 500 lines. Put large reference material in `references/`, not inline. Use the 3-level loading system consciously.

### How This Maps to Hermes

| Anthropic Concept | Hermes Current State | Gap |
|:---|:---|:---|
| Description 100–200 words (1024 chars) | `_read_skill_description()` truncates to **60 chars** | Descriptions are sentence fragments; insufficient for trigger decisions |
| "Pushy" description guidance | No guidance on description writing | Agent writes minimal descriptions by default |
| Post-use skill improvement | System prompt says "if a skill has issues, fix it with patch" | Reactive, not proactive; no guidance on WHAT to observe |
| Skill writing principles | skill_manage schema says "trigger conditions, numbered steps, pitfalls, verification" | Good but missing: explain-why, anti-overfitting, bundle-repeated-work, progressive disclosure awareness |
| Description optimization loop | No equivalent | Out of scope here (see #337), but the GUIDANCE is adoptable |
| Skill testing framework | No equivalent | Out of scope here (see #416 for validation) |

---

## Current State in Hermes Agent

### What We Already Have (and it's solid)

- **`skill_manage` tool** with create/patch/edit/delete — the agent can modify skills mid-conversation
- **System prompt injection** via `build_skills_system_prompt()` — automatic skill discovery
- **Progressive disclosure** — description in system prompt → `skill_view()` for body → `file_path` for resources
- **Security scanning** with rollback — way ahead of Anthropic
- **Skills Hub** with multi-source federation — distribution solved
- **CONTRIBUTING.md** with skill vs. tool criteria — decision framework exists

### What's Missing (scope of this issue)

1. **Description budget is too tight** — 60 chars is a sentence fragment (`"Expert guidance for fine-tuning LLMs with Axolotl - YAML ..."`)
2. **No guidance on writing triggerable descriptions** — agent doesn't know descriptions need to be "pushy"
3. **Passive improvement loop** — agent only patches when something actively breaks, doesn't proactively improve after use
4. **No skill writing principles** in the prompting — "explain why", "don't overfit", "bundle repeated work" are absent

---

## Implementation Plan

### Skill vs. Tool Classification

This is **not a skill or tool** — it's a set of improvements to existing codebase components: `prompt_builder.py` (description length + system prompt guidance), `skill_manager_tool.py` (schema description guidance), and `CONTRIBUTING.md` (documentation). All changes are to constants, strings, and documentation.

### What We'd Need

No new dependencies. No new files. Changes to 3 existing files.

### Phased Rollout

**Phase 1: Description & Triggering Improvements** (< 1 hour)

1. **Increase description budget in system prompt** — Change `_read_skill_description(max_chars=60)` to `max_chars=200` in `prompt_builder.py:117`. This gives the model 3x more context per skill for trigger decisions. System prompt growth is bounded: ~90 skills × 140 extra chars = ~12K chars — acceptable.

2. **Add "pushy description" guidance to `skill_manage` schema** — Append to the schema description:
   ```
   Write DESCRIPTIONS that are slightly aggressive about triggering — list
   synonyms, edge cases, and adjacent tasks the skill covers. The description
   is the ONLY thing seen when deciding whether to load a skill. Example:
   "Use this skill whenever the user mentions dashboards, reports, analytics,
   data viz, charts — even if they don't explicitly ask for one."
   ```

3. **Update CONTRIBUTING.md** skill-writing section with description best practices.

**Phase 2: Proactive Post-Use Improvement Loop** (< 30 min)

4. **Enhance system prompt guidance** — Replace the current passive `SKILLS_GUIDANCE` constant:
   ```python
   # Current:
   "After completing a complex task (5+ tool calls), fixing a tricky error, 
   or discovering a non-trivial workflow, consider saving the approach as a 
   skill with skill_manage so you can reuse it next time."

   # Proposed (adds post-use improvement):
   "After completing a complex task (5+ tool calls), fixing a tricky error, 
   or discovering a non-trivial workflow, consider saving the approach as a 
   skill with skill_manage so you can reuse it next time.
   
   After USING a skill, evaluate: did it have missing steps, outdated
   commands, unclear instructions, or repeated boilerplate you wrote by
   hand? If so, patch the skill immediately with the improvements —
   skills should get better every time they're used."
   ```

5. **Add post-use fix hint to `build_skills_system_prompt()` output** — After the existing "If a skill has issues, fix it with skill_manage(action='patch')" line, add:
   ```
   After using a skill successfully, improve it: add missing steps, update
   outdated info, move repeated boilerplate into scripts/. Skills improve
   through use.
   ```

**Phase 3: Skill Writing Principles** (< 30 min)

6. **Add writing principles to the `skill_manage` schema** — Extend the "Good skills" guidance:
   ```
   Good skills: trigger conditions, numbered steps with exact commands,
   pitfalls section, verification steps. Use imperative commands but
   explain WHY behind each instruction — models reason better with
   context. Keep SKILL.md under 500 lines; put large references in
   references/. Don't overfit instructions to one scenario — write
   for the general case. If you keep generating the same helper code
   when using a skill, move it into the skill's scripts/ folder.
   ```

---

## Pros & Cons

### Pros
- **Zero new infrastructure** — All changes are to constants and strings in 3 files
- **Immediate impact** — Better descriptions → better triggering on the next session
- **Compounds over time** — Proactive improvement loop means every skill use makes skills better
- **Learned from production** — Anthropic's patterns come from operating skills at scale (84.5K stars, production Claude deployment)
- **Compatible with #337** — If/when we build automated evolution, better starting skills = faster convergence

### Cons / Risks
- **System prompt growth** — Increasing description length from 60→200 adds ~12K chars for ~90 skills. Need to monitor context usage. Could mitigate with embedding-based pre-filtering later.
- **Proactive patching noise** — Agent might over-eagerly patch skills after every use. The guidance should emphasize "only if genuinely improved" not "always patch."
- **Instruction bloat** — Adding more guidance to the skill_manage schema and system prompt costs context tokens. Must keep additions concise.

---

## Open Questions

- Should we cap description at 200 or go to the full 1024 like Anthropic? 200 is a pragmatic middle ground for system prompt size, but we could also consider dynamic truncation based on total skill count.
- Should we add a `skill_used` counter or timestamp to skills metadata to track usage frequency? This would enable data-driven decisions about which skills to improve first (light lift, could be Phase 4).
- Is there value in adding an explicit "trigger conditions" YAML field separate from `description`? E.g., `triggers: ["dashboard", "data viz", "chart"]` for structured matching vs. relying on free-text descriptions.

---

## References

- [Anthropic skill-creator SKILL.md](https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md) — Source of the iterative development methodology
- [Anthropic improve_description.py](https://github.com/anthropics/skills/blob/main/skills/skill-creator/scripts/improve_description.py) — Automated description refinement using failure analysis
- [Anthropic run_loop.py](https://github.com/anthropics/skills/blob/main/skills/skill-creator/scripts/run_loop.py) — Description optimization with train/test splits
- [Anthropic quick_validate.py](https://github.com/anthropics/skills/blob/main/skills/skill-creator/scripts/quick_validate.py) — Structural validation (relevant to #416)
- Hermes #337 — Evolutionary Self-Improvement (automated pipeline, larger scope)
- Hermes #416 — Skill Validation & Linting (structural checks, complementary)
- `prompt_builder.py:117` — `_read_skill_description(max_chars=60)` — the 60-char truncation
- `skill_manager_tool.py:517-536` — `SKILL_MANAGE_SCHEMA` — current creation guidance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Skill Lifecycle Quality — Better Descriptions, Proactive Improvement Loop, and Writing Principles (inspired by Anthropic skill-creator) #429

Overview

Research Findings

What Anthropic Does Well

How This Maps to Hermes

Current State in Hermes Agent

What We Already Have (and it's solid)

What's Missing (scope of this issue)

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Anthropic Concept	Hermes Current State	Gap
Description 100–200 words (1024 chars)	`_read_skill_description()` truncates to 60 chars	Descriptions are sentence fragments; insufficient for trigger decisions
"Pushy" description guidance	No guidance on description writing	Agent writes minimal descriptions by default
Post-use skill improvement	System prompt says "if a skill has issues, fix it with patch"	Reactive, not proactive; no guidance on WHAT to observe
Skill writing principles	skill_manage schema says "trigger conditions, numbered steps, pitfalls, verification"	Good but missing: explain-why, anti-overfitting, bundle-repeated-work, progressive disclosure awareness
Description optimization loop	No equivalent	Out of scope here (see #337), but the GUIDANCE is adoptable
Skill testing framework	No equivalent	Out of scope here (see #416 for validation)

Feature: Skill Lifecycle Quality — Better Descriptions, Proactive Improvement Loop, and Writing Principles (inspired by Anthropic skill-creator) #429

Description

Overview

Research Findings

What Anthropic Does Well

How This Maps to Hermes

Current State in Hermes Agent

What We Already Have (and it's solid)

What's Missing (scope of this issue)

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions