"A true master teaches not by telling, but by refining." - The Skill Sensei
Sensei automates the improvement of Agent Skills frontmatter compliance using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with all tests passing.
- Overview
- Quick Start
- Prerequisites
- How It Works
- Configuration
- Scoring Criteria
- Examples
- Troubleshooting
- Contributing
The frontmatter audit revealed that all Azure skills have:
- 0% High adherence - No skills have triggers + anti-triggers + compatibility
- 46% Low adherence - 12 skills have minimal descriptions without clear triggers
- 0/26 anti-triggers - No skills tell agents when NOT to use them
This leads to skill collision - agents invoking the wrong skill for a given prompt.
Sensei implements the "Ralph Wiggum" technique:
- Read - Load the skill's current state and token count
- Score - Evaluate frontmatter compliance
- Improve - Add triggers, anti-triggers, compatibility
- Verify - Run tests to ensure changes work
- Validate References - Check markdown links are valid
- Check Tokens - Analyze token usage, gather suggestions
- Summary - Display before/after with suggestions
- Prompt - Ask user: Commit, Create Issue, or Skip?
- Repeat - Until target score reached
Run sensei on appinsights-instrumentation
Run sensei on appinsights-instrumentation --skip-integration
Run sensei on azure-security, azure-observability
Run sensei on all Low-adherence skills
Run sensei on all skills
| Flag | Description |
|---|---|
--skip-integration |
Skip integration tests for faster iteration (unit + trigger tests only) |
⚠️ Note: Using--skip-integrationspeeds up the loop significantly but may miss runtime issues. Consider running full tests before final commit.
-
Copilot CLI - Installed and authenticated
copilot --version
-
Node.js - For running tests
node --version # v18+ recommended -
Git - For commits
git --version
# Install test dependencies
cd tests
npm install
# Verify tests run
npm test -- --testPathPatterns=azure-validation┌─────────────────────────────────────────────────────────┐
│ START: User invokes "Run sensei on {skill-name}" │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 1. READ: Load plugin/skills/{skill-name}/SKILL.md │
│ Load tests/{skill-name}/ (if exists) │
│ Count tokens (baseline for comparison) │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 2. SCORE: Run rule-based compliance check │
│ • Check description length (> 150 chars?) │
│ • Check for trigger phrases ("USE FOR:") │
│ • Check for anti-triggers ("DO NOT USE FOR:") │
│ • Check for compatibility field │
└─────────────────────┬───────────────────────────────────┘
▼
┌───────────────┐
│ Score >= M-H │──YES──▶ COMPLETE ✓
│ AND tests pass│ (next skill)
└───────┬───────┘
│ NO
▼
┌─────────────────────────────────────────────────────────┐
│ 3. SCAFFOLD: If tests/{skill-name}/ missing: │
│ cp -r tests/_template tests/{skill-name} │
│ Update SKILL_NAME in test files │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 4. IMPROVE FRONTMATTER: │
│ • Add "USE FOR:" with trigger phrases │
│ • Add "DO NOT USE FOR:" with anti-triggers │
│ • Add compatibility if applicable │
│ • Keep description under 1024 chars │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 5. IMPROVE TESTS: │
│ • Update shouldTriggerPrompts (5+ prompts) │
│ • Update shouldNotTriggerPrompts (5+ prompts) │
│ • Match prompts to new frontmatter triggers │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 6. VERIFY: npm test -- --testPathPatterns={skill-name} │
│ • If tests fail → fix and retry │
│ • If tests pass → continue │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 7. VALIDATE REFERENCES: │
│ npm run references {skill-name} │
│ • Check markdown links are valid │
│ • Ensure links stay within skill directory │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 8. CHECK TOKENS: │
│ npm run tokens -- check plugin/skills/{skill-name} │
│ npm run tokens -- suggest (gather optimizations) │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 9. SUMMARY: Display before/after comparison │
│ • Score change (Low → Medium-High) │
│ • Token delta (+/- tokens) │
│ • Unimplemented suggestions │
└─────────────────────┬───────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ 10. PROMPT USER: Choose action │
│ [C] Commit changes │
│ [I] Create GitHub issue with suggestions │
│ [S] Skip (discard changes) │
└─────────────────────┬───────────────────────────────────┘
▼
┌───────────────┐
│ Iteration < 5 │──YES──▶ Go to step 2
└───────┬───────┘
│ NO
▼
TIMEOUT (move to next skill)
When running on multiple skills:
- Skills are processed sequentially
- Each skill goes through the full loop
- User prompted after each skill: Commit, Create Issue, or Skip
- Summary report at the end shows all results
| Setting | Default | Description |
|---|---|---|
| Max iterations | 5 | Per-skill iteration limit before moving on |
| Target score | Medium-High | Minimum compliance level |
| Token soft limit | 500 | SKILL.md target token count |
| User prompt | After each skill | Commit, Create Issue, or Skip |
| Continue on failure | Yes | Process remaining skills if one fails |
| Skip integration | No | Use --skip-integration to run only unit/trigger tests |
| Level | Description | Criteria |
|---|---|---|
| Low | Basic description | No explicit triggers, no anti-triggers, often < 150 chars |
| Medium | Has trigger keywords | Description > 150 chars, implicit or explicit trigger phrases |
| Medium-High | Has triggers + anti-triggers | "USE FOR:" present AND "DO NOT USE FOR:" present |
| High | Full compliance | Triggers + anti-triggers + compatibility field |
Per the agentskills.io specification:
-
Name validation
- Lowercase alphanumeric + hyphens only
- No consecutive hyphens (
--) - Must not start or end with hyphen
- Matches directory name
- ≤ 64 characters
-
Description length
- Minimum: 150 characters (effective)
- Maximum: 1024 characters (spec limit)
-
Trigger phrases
- Contains "USE FOR:", "TRIGGERS:", or "Use this skill when"
- Lists specific keywords and phrases
-
Anti-triggers
- Contains "DO NOT USE FOR:" or "NOT FOR:"
- Lists scenarios that should use other skills
-
Compatibility (optional for Medium-High)
- Lists required tools/frameworks
- Documents prerequisites
- Max 500 characters per spec
-
Optional spec fields (preserve if present)
license,metadata,allowed-tools
-
Size limits
- SKILL.md < 500 lines (spec recommendation)
- SKILL.md < 500 tokens (soft), < 5000 (hard)
To reach Medium-High, a skill must have:
- ✅ Description > 150 characters
- ✅ Explicit trigger phrases ("USE FOR:" or equivalent)
- ✅ Anti-triggers ("DO NOT USE FOR:" or clear scope limitation)
- ✅ SKILL.md < 500 tokens (soft limit, monitored)
From skill-authoring:
- SKILL.md: < 500 tokens (soft), < 5000 (hard)
- references/*.md: < 1000 tokens each
- Check with:
cd scripts && npm run tokens -- check plugin/skills/{skill}/SKILL.md
---
name: appinsights-instrumentation
description: 'Instrument a webapp to send useful telemetry data to Azure App Insights'
---Problems:
- Only 71 characters
- No trigger phrases
- No anti-triggers
- Agent doesn't know when to activate
---
name: appinsights-instrumentation
description: >-
Instrument web apps to send telemetry to Azure Application Insights.
USE FOR: "add App Insights", "instrument my app", "set up monitoring",
"add telemetry", "track requests", "ASP.NET Core telemetry", "Node.js monitoring".
DO NOT USE FOR: querying logs (use azure-observability), creating alerts,
dashboard configuration, or cost analysis.
---Improvements:
- ~350 characters (informative but under limit)
- Clear description of purpose
- Explicit trigger phrases
- Anti-triggers prevent collision with azure-observability
Before (empty):
const shouldTriggerPrompts = [];
const shouldNotTriggerPrompts = [];After:
const shouldTriggerPrompts = [
'Add App Insights to my web app',
'Instrument my ASP.NET Core app for monitoring',
'Set up telemetry for my Node.js application',
'How do I track requests in Application Insights?',
'Add monitoring to my webapp',
];
const shouldNotTriggerPrompts = [
'Query my Application Insights logs',
'Create an alert for high CPU usage',
'Show me my App Insights dashboard',
'How much does App Insights cost?',
'Help me with AWS CloudWatch',
];Symptom: Tests fail after frontmatter changes
Solution:
- Check that
shouldTriggerPromptsmatch the new trigger phrases - Check that
shouldNotTriggerPromptsmatch the new anti-triggers - Run tests manually to see specific failures:
cd tests npm test -- --testPathPatterns={skill-name} --verbose
Symptom: Ralph loops 5 times without reaching Medium-High
Possible causes:
- Description too long (> 1024 chars) - trim content
- Anti-triggers not in recognized format - use "DO NOT USE FOR:"
- Conflicting triggers with other skills - make more specific
# Undo last commit
git reset --soft HEAD~1
# Undo all sensei commits for a skill
git log --oneline | grep "sensei: improve {skill-name}" | head -5
git reset --hard {commit-before-sensei}# See all sensei improvements
git log --oneline --grep="sensei:"
# See changes to a specific skill
git log --oneline -p plugin/skills/{skill-name}/SKILL.mdThe Sensei skill lives at .github/skills/sensei/. To improve it:
- Edit
SKILL.mdfor instruction changes - Edit
references/*.mdfor documentation changes - Test on a sample skill before committing
- Document the rule in
references/SCORING.md - Add examples in
references/EXAMPLES.md - Update the rule-based checks in
SKILL.md
If Sensei produces unexpected results:
- Note the skill name and starting state
- Capture the commit history:
git log --oneline -10 - Open an issue with reproduction steps
- Ralph (Copilot CLI runner) - Original Ralph loop implementation
- Agent Skills Specification - Official spec
- Frontmatter Compliance Audit - Audit results
- Anthropic Skills Best Practices - Writing guidance
Sensei - "The path to compliance begins with a single trigger." 🥋
- markdown-token-optimizer - Token analysis and optimization suggestions
- skill-authoring - Guidelines for writing compliant Agent Skills