Summary
Add a new grader that validates whether a skill should or shouldn't activate for a given prompt. The grader extracts keywords from SKILL.md (name, description, content) and matches them against the task prompt with a configurable threshold.
Motivation
Currently there's no way to validate trigger accuracy in evals — whether a skill correctly activates for relevant prompts and stays silent for irrelevant ones. This is the primary missing grader type for skill quality assessment.
Proposed Implementation
Create internal/graders/trigger_grader.go:
- Inputs: SKILL.md path, prompt text, expected outcome (should_trigger / should_not_trigger)
- Keyword extraction: Parse SKILL.md description, trigger phrases, and content to build a keyword set
- Matching: Score prompt against keyword set using configurable threshold (default: 0.6)
- Modes:
positive: skill SHOULD activate for the prompt (score >= threshold → pass)
negative: skill should NOT activate (score < threshold → pass)
Example eval.yaml
graders:
- type: trigger
params:
skill_path: skills/azure-deploy/SKILL.md
mode: positive
threshold: 0.6
Acceptance Criteria
Summary
Add a new grader that validates whether a skill should or shouldn't activate for a given prompt. The grader extracts keywords from SKILL.md (name, description, content) and matches them against the task prompt with a configurable threshold.
Motivation
Currently there's no way to validate trigger accuracy in evals — whether a skill correctly activates for relevant prompts and stays silent for irrelevant ones. This is the primary missing grader type for skill quality assessment.
Proposed Implementation
Create
internal/graders/trigger_grader.go:positive: skill SHOULD activate for the prompt (score >= threshold → pass)negative: skill should NOT activate (score < threshold → pass)Example eval.yaml
Acceptance Criteria
trigger_grader.gowith keyword extraction from SKILL.md