sensei

Sensei

"A true master teaches not by telling, but by refining." - The Skill Sensei

Sensei automates the improvement of Agent Skills frontmatter compliance using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with all tests passing.

Overview

The Problem

The frontmatter audit revealed that all Azure skills have:

0% High adherence - No skills have triggers + anti-triggers + compatibility
46% Low adherence - 12 skills have minimal descriptions without clear triggers
0/26 anti-triggers - No skills tell agents when NOT to use them

This leads to skill collision - agents invoking the wrong skill for a given prompt.

The Solution

Sensei implements the "Ralph Wiggum" technique:

Read - Load the skill's current state and token count
Score - Evaluate frontmatter compliance
Improve - Add triggers, anti-triggers, compatibility
Verify - Run tests to ensure changes work
Validate References - Check markdown links are valid
Check Tokens - Analyze token usage, gather suggestions
Summary - Display before/after with suggestions
Prompt - Ask user: Commit, Create Issue, or Skip?
Repeat - Until target score reached

Quick Start

Single Skill

Run sensei on appinsights-instrumentation

Single Skill (Fast Mode)

Run sensei on appinsights-instrumentation --skip-integration

Multiple Skills

Run sensei on azure-security, azure-observability

All Low-Adherence Skills

Run sensei on all Low-adherence skills

All Skills

Run sensei on all skills

Flags

Flag	Description
`--skip-integration`	Skip integration tests for faster iteration (unit + trigger tests only)

⚠️ Note: Using --skip-integration speeds up the loop significantly but may miss runtime issues. Consider running full tests before final commit.

Prerequisites

Required

Copilot CLI - Installed and authenticated
```
copilot --version
```
Node.js - For running tests
```
node --version  # v18+ recommended
```
Git - For commits
```
git --version
```

Setup

# Install test dependencies
cd tests
npm install

# Verify tests run
npm test -- --testPathPatterns=azure-validation

How It Works

The Ralph Loop

┌─────────────────────────────────────────────────────────┐
│  START: User invokes "Run sensei on {skill-name}"       │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  1. READ: Load plugin/skills/{skill-name}/SKILL.md      │
│           Load tests/{skill-name}/ (if exists)          │
│           Count tokens (baseline for comparison)        │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  2. SCORE: Run rule-based compliance check              │
│     • Check description length (> 150 chars?)           │
│     • Check for trigger phrases ("USE FOR:")            │
│     • Check for anti-triggers ("DO NOT USE FOR:")       │
│     • Check for compatibility field                     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Score >= M-H  │──YES──▶ COMPLETE ✓
              │ AND tests pass│        (next skill)
              └───────┬───────┘
                      │ NO
                      ▼
┌─────────────────────────────────────────────────────────┐
│  3. SCAFFOLD: If tests/{skill-name}/ missing:           │
│     cp -r tests/_template tests/{skill-name}            │
│     Update SKILL_NAME in test files                     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  4. IMPROVE FRONTMATTER:                                │
│     • Add "USE FOR:" with trigger phrases               │
│     • Add "DO NOT USE FOR:" with anti-triggers          │
│     • Add compatibility if applicable                   │
│     • Keep description under 1024 chars                 │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  5. IMPROVE TESTS:                                      │
│     • Update shouldTriggerPrompts (5+ prompts)          │
│     • Update shouldNotTriggerPrompts (5+ prompts)       │
│     • Match prompts to new frontmatter triggers         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  6. VERIFY: npm test -- --testPathPatterns={skill-name}  │
│     • If tests fail → fix and retry                     │
│     • If tests pass → continue                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  7. VALIDATE REFERENCES:                                │
│     npm run references {skill-name}                     │
│     • Check markdown links are valid                    │
│     • Ensure links stay within skill directory          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  8. CHECK TOKENS:                                       │
│     npm run tokens -- check plugin/skills/{skill-name}  │
│     npm run tokens -- suggest (gather optimizations)    │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  9. SUMMARY: Display before/after comparison            │
│     • Score change (Low → Medium-High)                  │
│     • Token delta (+/- tokens)                          │
│     • Unimplemented suggestions                         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  10. PROMPT USER: Choose action                         │
│     [C] Commit changes                                  │
│     [I] Create GitHub issue with suggestions            │
│     [S] Skip (discard changes)                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Iteration < 5 │──YES──▶ Go to step 2
              └───────┬───────┘
                      │ NO
                      ▼
               TIMEOUT (move to next skill)

Batch Processing

When running on multiple skills:

Skills are processed sequentially
Each skill goes through the full loop
User prompted after each skill: Commit, Create Issue, or Skip
Summary report at the end shows all results

Configuration

Setting	Default	Description
Max iterations	5	Per-skill iteration limit before moving on
Target score	Medium-High	Minimum compliance level
Token soft limit	500	SKILL.md target token count
User prompt	After each skill	Commit, Create Issue, or Skip
Continue on failure	Yes	Process remaining skills if one fails
Skip integration	No	Use `--skip-integration` to run only unit/trigger tests

Scoring Criteria

Adherence Levels

Level	Description	Criteria
Low	Basic description	No explicit triggers, no anti-triggers, often < 150 chars
Medium	Has trigger keywords	Description > 150 chars, implicit or explicit trigger phrases
Medium-High	Has triggers + anti-triggers	"USE FOR:" present AND "DO NOT USE FOR:" present
High	Full compliance	Triggers + anti-triggers + compatibility field

Rule-Based Checks

Per the agentskills.io specification:

Name validation
- Lowercase alphanumeric + hyphens only
- No consecutive hyphens (--)
- Must not start or end with hyphen
- Matches directory name
- ≤ 64 characters
Description length
- Minimum: 150 characters (effective)
- Maximum: 1024 characters (spec limit)
Trigger phrases
- Contains "USE FOR:", "TRIGGERS:", or "Use this skill when"
- Lists specific keywords and phrases
Anti-triggers
- Contains "DO NOT USE FOR:" or "NOT FOR:"
- Lists scenarios that should use other skills
Compatibility (optional for Medium-High)
- Lists required tools/frameworks
- Documents prerequisites
- Max 500 characters per spec
Optional spec fields (preserve if present)
- license, metadata, allowed-tools
Size limits
- SKILL.md < 500 lines (spec recommendation)
- SKILL.md < 500 tokens (soft), < 5000 (hard)

Target: Medium-High

To reach Medium-High, a skill must have:

✅ Description > 150 characters
✅ Explicit trigger phrases ("USE FOR:" or equivalent)
✅ Anti-triggers ("DO NOT USE FOR:" or clear scope limitation)
✅ SKILL.md < 500 tokens (soft limit, monitored)

Token Budget

From skill-authoring:

SKILL.md: < 500 tokens (soft), < 5000 (hard)
references/*.md: < 1000 tokens each
Check with: cd scripts && npm run tokens -- check plugin/skills/{skill}/SKILL.md

Examples

Before: Low Adherence

---
name: appinsights-instrumentation
description: 'Instrument a webapp to send useful telemetry data to Azure App Insights'
---

Problems:

Only 71 characters
No trigger phrases
No anti-triggers
Agent doesn't know when to activate

After: Medium-High Adherence

---
name: appinsights-instrumentation
description: >-
  Instrument web apps to send telemetry to Azure Application Insights.
  USE FOR: "add App Insights", "instrument my app", "set up monitoring",
  "add telemetry", "track requests", "ASP.NET Core telemetry", "Node.js monitoring".
  DO NOT USE FOR: querying logs (use azure-observability), creating alerts,
  dashboard configuration, or cost analysis.
---

Improvements:

~350 characters (informative but under limit)
Clear description of purpose
Explicit trigger phrases
Anti-triggers prevent collision with azure-observability

Test Updates

Before (empty):

const shouldTriggerPrompts = [];
const shouldNotTriggerPrompts = [];

After:

const shouldTriggerPrompts = [
  'Add App Insights to my web app',
  'Instrument my ASP.NET Core app for monitoring',
  'Set up telemetry for my Node.js application',
  'How do I track requests in Application Insights?',
  'Add monitoring to my webapp',
];

const shouldNotTriggerPrompts = [
  'Query my Application Insights logs',
  'Create an alert for high CPU usage',
  'Show me my App Insights dashboard',
  'How much does App Insights cost?',
  'Help me with AWS CloudWatch',
];

Troubleshooting

Tests Failing After Improvement

Symptom: Tests fail after frontmatter changes

Solution:

Check that shouldTriggerPrompts match the new trigger phrases
Check that shouldNotTriggerPrompts match the new anti-triggers

Run tests manually to see specific failures:

cd tests
npm test -- --testPathPatterns={skill-name} --verbose

Skill Not Reaching Target Score

Symptom: Ralph loops 5 times without reaching Medium-High

Possible causes:

Description too long (> 1024 chars) - trim content
Anti-triggers not in recognized format - use "DO NOT USE FOR:"
Conflicting triggers with other skills - make more specific

Rolling Back Changes

# Undo last commit
git reset --soft HEAD~1

# Undo all sensei commits for a skill
git log --oneline | grep "sensei: improve {skill-name}" | head -5
git reset --hard {commit-before-sensei}

Viewing Progress

# See all sensei improvements
git log --oneline --grep="sensei:"

# See changes to a specific skill
git log --oneline -p plugin/skills/{skill-name}/SKILL.md

Contributing

Improving the Sensei Skill

The Sensei skill lives at .github/skills/sensei/. To improve it:

Edit SKILL.md for instruction changes
Edit references/*.md for documentation changes
Test on a sample skill before committing

Adding New Scoring Rules

Document the rule in references/SCORING.md
Add examples in references/EXAMPLES.md
Update the rule-based checks in SKILL.md

Reporting Issues

If Sensei produces unexpected results:

Note the skill name and starting state
Capture the commit history: git log --oneline -10
Open an issue with reproduction steps

References

Ralph (Copilot CLI runner) - Original Ralph loop implementation
Agent Skills Specification - Official spec
Frontmatter Compliance Audit - Audit results
Anthropic Skills Best Practices - Writing guidance

Sensei - "The path to compliance begins with a single trigger." 🥋

Related Skills

markdown-token-optimizer - Token analysis and optimization suggestions
skill-authoring - Guidelines for writing compliant Agent Skills

Name		Name	Last commit message	Last commit date
parent directory ..
references		references
scripts/gepa		scripts/gepa
README.md		README.md
SKILL.md		SKILL.md

FilesExpand file tree

sensei

Directory actions

More options

Directory actions

More options

Latest commit

History

sensei

Folders and files

parent directory

README.md

Sensei

Table of Contents

Overview

The Problem

The Solution

Quick Start

Single Skill

Single Skill (Fast Mode)

Multiple Skills

All Low-Adherence Skills

All Skills

Flags

Prerequisites

Required

Setup

How It Works

The Ralph Loop

Batch Processing

Configuration

Scoring Criteria

Adherence Levels

Rule-Based Checks

Target: Medium-High

Token Budget

Examples

Before: Low Adherence

After: Medium-High Adherence

Test Updates

Troubleshooting

Tests Failing After Improvement

Skill Not Reaching Target Score

Rolling Back Changes

Viewing Progress

Contributing

Improving the Sensei Skill

Adding New Scoring Rules

Reporting Issues

References

Related Skills