You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anthropic's skill-creator meta-skill reveals several practical, low-effort improvements to how Hermes Agent creates, triggers, and iteratively refines skills. Unlike our existing #337 (evolutionary self-improvement via automated pipelines) or #416 (structural validation), this issue focuses on day-to-day skill quality — making the agent write better skills, trigger them more reliably, and improve them during normal use.
The core insight from Anthropic's approach: the skill lifecycle should be a closed loop — create → use → observe gaps → refine → use again. Hermes has the primitives for this (skill_manage with patch, system prompt encouragement) but several tweaks would make the loop significantly tighter.
Source:anthropics/skills/skill-creator — specifically the SKILL.md methodology, improve_description.py, and run_loop.py for description optimization.
Research Findings
What Anthropic Does Well
1. "Pushy" Description Philosophy
Claude undertriggers skills by default — the model errs on the side of NOT loading a skill even when it's relevant. Anthropic's fix: descriptions should be slightly aggressive, explicitly listing edge cases and synonyms:
"Make sure to use this skill whenever the user mentions dashboards, reports, analytics, data viz, charts, or visualizations — even if they don't explicitly ask for a 'dashboard'."
2. Description Length Budget
Anthropic allows up to 1,024 characters for descriptions and recommends 100–200 words. The description is the only thing the model sees when deciding whether to trigger a skill — it needs room to convey trigger conditions, not just a sentence fragment.
3. Imperative + Explain Why
Skills should use imperative commands but explain the reasoning. From their guide:
"Today's LLMs are smart. They have good theory of mind... If you find yourself writing ALWAYS or NEVER in all caps, that's a yellow flag — reframe and explain the reasoning."
4. Anti-Overfitting Guidance
"Don't make instructions too narrow to the test cases; aim for generalizability. Use metaphors and general patterns."
5. "Bundle Repeated Work"
If multiple uses of a skill result in the agent writing the same Python script, that script should be moved into the skill's scripts/ folder. This is a practical iterative refinement pattern.
6. Progressive Disclosure Awareness
Keep SKILL.md under 500 lines. Put large reference material in references/, not inline. Use the 3-level loading system consciously.
How This Maps to Hermes
Anthropic Concept
Hermes Current State
Gap
Description 100–200 words (1024 chars)
_read_skill_description() truncates to 60 chars
Descriptions are sentence fragments; insufficient for trigger decisions
"Pushy" description guidance
No guidance on description writing
Agent writes minimal descriptions by default
Post-use skill improvement
System prompt says "if a skill has issues, fix it with patch"
Reactive, not proactive; no guidance on WHAT to observe
skill_manage tool with create/patch/edit/delete — the agent can modify skills mid-conversation
System prompt injection via build_skills_system_prompt() — automatic skill discovery
Progressive disclosure — description in system prompt → skill_view() for body → file_path for resources
Security scanning with rollback — way ahead of Anthropic
Skills Hub with multi-source federation — distribution solved
CONTRIBUTING.md with skill vs. tool criteria — decision framework exists
What's Missing (scope of this issue)
Description budget is too tight — 60 chars is a sentence fragment ("Expert guidance for fine-tuning LLMs with Axolotl - YAML ...")
No guidance on writing triggerable descriptions — agent doesn't know descriptions need to be "pushy"
Passive improvement loop — agent only patches when something actively breaks, doesn't proactively improve after use
No skill writing principles in the prompting — "explain why", "don't overfit", "bundle repeated work" are absent
Implementation Plan
Skill vs. Tool Classification
This is not a skill or tool — it's a set of improvements to existing codebase components: prompt_builder.py (description length + system prompt guidance), skill_manager_tool.py (schema description guidance), and CONTRIBUTING.md (documentation). All changes are to constants, strings, and documentation.
What We'd Need
No new dependencies. No new files. Changes to 3 existing files.
Increase description budget in system prompt — Change _read_skill_description(max_chars=60) to max_chars=200 in prompt_builder.py:117. This gives the model 3x more context per skill for trigger decisions. System prompt growth is bounded: ~90 skills × 140 extra chars = ~12K chars — acceptable.
Add "pushy description" guidance to skill_manage schema — Append to the schema description:
Write DESCRIPTIONS that are slightly aggressive about triggering — list
synonyms, edge cases, and adjacent tasks the skill covers. The description
is the ONLY thing seen when deciding whether to load a skill. Example:
"Use this skill whenever the user mentions dashboards, reports, analytics,
data viz, charts — even if they don't explicitly ask for one."
Update CONTRIBUTING.md skill-writing section with description best practices.
Add post-use fix hint to build_skills_system_prompt() output — After the existing "If a skill has issues, fix it with skill_manage(action='patch')" line, add:
After using a skill successfully, improve it: add missing steps, update
outdated info, move repeated boilerplate into scripts/. Skills improve
through use.
Phase 3: Skill Writing Principles (< 30 min)
Add writing principles to the skill_manage schema — Extend the "Good skills" guidance:
Good skills: trigger conditions, numbered steps with exact commands,
pitfalls section, verification steps. Use imperative commands but
explain WHY behind each instruction — models reason better with
context. Keep SKILL.md under 500 lines; put large references in
references/. Don't overfit instructions to one scenario — write
for the general case. If you keep generating the same helper code
when using a skill, move it into the skill's scripts/ folder.
Pros & Cons
Pros
Zero new infrastructure — All changes are to constants and strings in 3 files
Immediate impact — Better descriptions → better triggering on the next session
Compounds over time — Proactive improvement loop means every skill use makes skills better
Learned from production — Anthropic's patterns come from operating skills at scale (84.5K stars, production Claude deployment)
System prompt growth — Increasing description length from 60→200 adds ~12K chars for ~90 skills. Need to monitor context usage. Could mitigate with embedding-based pre-filtering later.
Proactive patching noise — Agent might over-eagerly patch skills after every use. The guidance should emphasize "only if genuinely improved" not "always patch."
Instruction bloat — Adding more guidance to the skill_manage schema and system prompt costs context tokens. Must keep additions concise.
Open Questions
Should we cap description at 200 or go to the full 1024 like Anthropic? 200 is a pragmatic middle ground for system prompt size, but we could also consider dynamic truncation based on total skill count.
Should we add a skill_used counter or timestamp to skills metadata to track usage frequency? This would enable data-driven decisions about which skills to improve first (light lift, could be Phase 4).
Is there value in adding an explicit "trigger conditions" YAML field separate from description? E.g., triggers: ["dashboard", "data viz", "chart"] for structured matching vs. relying on free-text descriptions.
Overview
Anthropic's skill-creator meta-skill reveals several practical, low-effort improvements to how Hermes Agent creates, triggers, and iteratively refines skills. Unlike our existing #337 (evolutionary self-improvement via automated pipelines) or #416 (structural validation), this issue focuses on day-to-day skill quality — making the agent write better skills, trigger them more reliably, and improve them during normal use.
The core insight from Anthropic's approach: the skill lifecycle should be a closed loop — create → use → observe gaps → refine → use again. Hermes has the primitives for this (skill_manage with patch, system prompt encouragement) but several tweaks would make the loop significantly tighter.
Source: anthropics/skills/skill-creator — specifically the SKILL.md methodology,
improve_description.py, andrun_loop.pyfor description optimization.Research Findings
What Anthropic Does Well
1. "Pushy" Description Philosophy
Claude undertriggers skills by default — the model errs on the side of NOT loading a skill even when it's relevant. Anthropic's fix: descriptions should be slightly aggressive, explicitly listing edge cases and synonyms:
2. Description Length Budget
Anthropic allows up to 1,024 characters for descriptions and recommends 100–200 words. The description is the only thing the model sees when deciding whether to trigger a skill — it needs room to convey trigger conditions, not just a sentence fragment.
3. Imperative + Explain Why
Skills should use imperative commands but explain the reasoning. From their guide:
4. Anti-Overfitting Guidance
5. "Bundle Repeated Work"
If multiple uses of a skill result in the agent writing the same Python script, that script should be moved into the skill's
scripts/folder. This is a practical iterative refinement pattern.6. Progressive Disclosure Awareness
Keep SKILL.md under 500 lines. Put large reference material in
references/, not inline. Use the 3-level loading system consciously.How This Maps to Hermes
_read_skill_description()truncates to 60 charsCurrent State in Hermes Agent
What We Already Have (and it's solid)
skill_managetool with create/patch/edit/delete — the agent can modify skills mid-conversationbuild_skills_system_prompt()— automatic skill discoveryskill_view()for body →file_pathfor resourcesWhat's Missing (scope of this issue)
"Expert guidance for fine-tuning LLMs with Axolotl - YAML ...")Implementation Plan
Skill vs. Tool Classification
This is not a skill or tool — it's a set of improvements to existing codebase components:
prompt_builder.py(description length + system prompt guidance),skill_manager_tool.py(schema description guidance), andCONTRIBUTING.md(documentation). All changes are to constants, strings, and documentation.What We'd Need
No new dependencies. No new files. Changes to 3 existing files.
Phased Rollout
Phase 1: Description & Triggering Improvements (< 1 hour)
Increase description budget in system prompt — Change
_read_skill_description(max_chars=60)tomax_chars=200inprompt_builder.py:117. This gives the model 3x more context per skill for trigger decisions. System prompt growth is bounded: ~90 skills × 140 extra chars = ~12K chars — acceptable.Add "pushy description" guidance to
skill_manageschema — Append to the schema description:Update CONTRIBUTING.md skill-writing section with description best practices.
Phase 2: Proactive Post-Use Improvement Loop (< 30 min)
Enhance system prompt guidance — Replace the current passive
SKILLS_GUIDANCEconstant:Add post-use fix hint to
build_skills_system_prompt()output — After the existing "If a skill has issues, fix it with skill_manage(action='patch')" line, add:Phase 3: Skill Writing Principles (< 30 min)
skill_manageschema — Extend the "Good skills" guidance:Pros & Cons
Pros
Cons / Risks
Open Questions
skill_usedcounter or timestamp to skills metadata to track usage frequency? This would enable data-driven decisions about which skills to improve first (light lift, could be Phase 4).description? E.g.,triggers: ["dashboard", "data viz", "chart"]for structured matching vs. relying on free-text descriptions.References
prompt_builder.py:117—_read_skill_description(max_chars=60)— the 60-char truncationskill_manager_tool.py:517-536—SKILL_MANAGE_SCHEMA— current creation guidance