feat: Add trigger heuristic grader#90
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new trigger heuristic grader type to validate whether a prompt should (or should not) activate a skill, closing #80.
Changes:
- Introduces
triggergrader implementation + unit tests. - Wires
triggerinto grader type registries, creation factory, and JSON schemas. - Adds end-user documentation for configuring and interpreting trigger grading output.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| schemas/task.schema.json | Adds trigger grader type and triggerGraderConfig definition for task schema validation. |
| schemas/eval.schema.json | Adds trigger grader type and documented triggerGraderConfig definition for eval schema validation. |
| internal/suggest/suggest.go | Includes trigger in the set of grader types surfaced by suggest. |
| internal/suggest/grader_docs.go | Adds a short description for the trigger grader in suggest docs. |
| internal/models/outcome.go | Defines GraderKindTrigger constant for the new grader kind. |
| internal/graders/trigger_grader.go | Implements the trigger heuristic grader (keyword + phrase scoring). |
| internal/graders/trigger_grader_test.go | Adds tests for constructor validation, mode behavior, thresholds, and factory creation. |
| internal/graders/grader.go | Extends Create factory to construct the trigger grader from config. |
| docs/graders/trigger.md | Documents trigger grader configuration, scoring behavior, and output details. |
| docs/graders/README.md | Links the new trigger grader docs from the graders index. |
| .squad/log/2026-03-05T00-36-issue-assignment-pipeline.md | Adds session log entry (process documentation). |
| .squad/log/2026-03-05T00-26-rusty-token-diff-design.md | Adds session log entry (process documentation). |
| .squad/decisions.md | Records decision notes (process documentation). |
| .squad/agents/linus/history.md | Records learnings summary for #80 (process documentation). |
Comments suppressed due to low confidence (4)
internal/graders/trigger_grader.go:163
bestPhraseScorere-tokenizes the prompt even thoughscorePromptalready computedpromptTokens. Consider passingpromptTokens(and/or a precomputed token set + lowercased prompt) intobestPhraseScoreto avoid duplicate tokenization and allocation on every grade call.
func (g *triggerHeuristicGrader) scorePrompt(prompt string) (score float64, phraseScore float64, matched []string) {
promptTokens := tokenize(prompt)
if len(promptTokens) == 0 {
return 0, 0, nil
}
seen := make(map[string]bool)
for _, token := range promptTokens {
if _, ok := g.keywords[token]; ok && !seen[token] {
seen[token] = true
matched = append(matched, token)
}
}
tokenScore := float64(len(matched)) / float64(len(promptTokens))
phraseScore = g.bestPhraseScore(prompt)
if phraseScore > tokenScore {
return phraseScore, phraseScore, matched
}
return tokenScore, phraseScore, matched
}
func (g *triggerHeuristicGrader) bestPhraseScore(prompt string) float64 {
if len(g.triggerPhrases) == 0 {
return 0
}
promptLower := strings.ToLower(prompt)
promptTokenSet := make(map[string]struct{})
for _, token := range tokenize(prompt) {
promptTokenSet[token] = struct{}{}
}
internal/graders/trigger_grader.go:203
- The error messages hardcode
SKILL.md, butskillPathmay be any filename/path (even though docs recommend SKILL.md). Consider changing these messages to reference “skill file” (or include the provided path without assuming the basename) to avoid misleading users during debugging.
func loadTriggerHeuristicData(skillPath string) (map[string]struct{}, []string, error) {
data, err := os.ReadFile(skillPath)
if err != nil {
return nil, nil, fmt.Errorf("reading SKILL.md %s: %w", skillPath, err)
}
var sk skill.Skill
if err := sk.UnmarshalText(data); err != nil {
return nil, nil, fmt.Errorf("parsing SKILL.md %s: %w", skillPath, err)
}
schemas/task.schema.json:868
triggerGraderConfigintask.schema.jsonis missing thedescriptionfields that were added ineval.schema.json(for the object and its properties). Adding the same descriptions here would keep the schemas consistent and improve generated docs / editor intellisense for task configs.
"triggerGraderConfig": {
"type": "object",
"required": [
"skill_path",
"mode"
],
"additionalProperties": false,
"properties": {
"skill_path": {
"type": "string",
"minLength": 1
},
"mode": {
"type": "string",
"enum": [
"positive",
"negative"
]
},
"threshold": {
"type": "number",
"minimum": 0,
"maximum": 1,
"default": 0.6
}
}
},
internal/graders/trigger_grader_test.go:141
- Prefer
require.GreaterOrEqual(t, result.Score, threshold)for clearer failure output and consistency with other assertions in this test file.
require.True(t, result.Score >= threshold)
require.True(t, result.Passed)
|
I like the consistency of having everything be a grader instead of having graders and trigger tests defined separately as we do today (for example). Do you want to remove the existing trigger test feature in this PR? |
spboyer
left a comment
There was a problem hiding this comment.
Rusty (Opus 4.6) —
The trigger heuristic grader implementation is solid architecturally — clean grader pattern, constructor validation, 4 test functions, \docs/graders/trigger.md, schema updates. I want to approve this.
Blocker: Merge conflict. GitHub reports \mergeable: CONFLICTING. Only 1 CI check (CLA) ran — the full Go CI matrix didn't execute.
Action needed:
- Rebase onto main to resolve the merge conflict (likely in \internal/models/outcome.go\ where fields overlap with PR #91)
- Verify all 7 CI checks pass
- Push the rebased branch
Once CI is green, I'll approve immediately.
7d59e0e to
671e8a9
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #90 +/- ##
=======================================
Coverage ? 73.37%
=======================================
Files ? 132
Lines ? 15231
Branches ? 0
=======================================
Hits ? 11176
Misses ? 3238
Partials ? 817
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
671e8a9 to
9fec0b1
Compare
9fec0b1 to
3f422a9
Compare
wbreza
left a comment
There was a problem hiding this comment.
Code Review: PR #90 - feat: Add trigger heuristic grader
What Looks Good
- Follows existing grader patterns
- Clean mode separation with good tests
- Smart keyword extraction strips DO NOT USE FOR content
- Dual scoring approach with good coverage
- Thorough constructor validation and docs included
Suggestions (non-blocking)
- Stop words include trigger verbs (use, when, help). Consider excluding from stop words.
Summary
| Priority | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 1 |
| Low | 2 |
Overall Assessment: Approve
3f422a9 to
e357ce2
Compare
wbreza
left a comment
There was a problem hiding this comment.
Code Review: PR #90 — feat: Add trigger heuristic grader
✅ What Looks Good
- Follows existing grader patterns — factory registration,
measureTimewrapper, Grader interface, consistent details map - Clean scoring design —
max(tokenScore, phraseScore)gives two complementary signals - Proper anti-trigger exclusion — strips
DO NOT USE FOR:before keyword extraction - Thoughtful tokenizer — stop words, min length, unicode-aware
- Constructor validation — clear error messages for mode, threshold, skill_path
- Schema, docs, and suggest registration all updated consistently
Findings Summary
| Priority | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 0 |
| Low | 2 |
| Total | 2 |
Overall Assessment: Approve — clean implementation. Low findings are minor observations only.
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR #90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed wbreza review feedback in feb26ef:
|
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR #90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6928ee3 to
ef98d65
Compare
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4ebcdb8 to
9bbce55
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (7)
internal/mcp/server_test.go:119
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:154
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:193
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:244
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:272
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:28
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
internal/mcp/server_test.go:218
- The
returnaftert.Fatal(...)is redundant—t.Fatalalready stops the test goroutine. Removing the extrareturnkeeps the test code cleaner and avoids implyingt.Fatalmight fall through.
if resp == nil {
t.Fatal("expected response, got nil")
return
}
spboyer
left a comment
There was a problem hiding this comment.
Re: return after t.Fatal — these are intentional. staticcheck SA5011 doesn't recognize t.Fatal as a terminating call and flags the subsequent resp.Error access as a possible nil pointer dereference. The return is the standard workaround to satisfy the linter.
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d83dd62 to
f8e0861
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
wbreza
left a comment
There was a problem hiding this comment.
Re-Review: PR #90 — feat: Add trigger heuristic grader
All previous review feedback has been properly addressed:
✅ Stop words updated — Removed 'when', 'use', 'help' to preserve trigger verbs
✅ Double tokenization eliminated — �estPhraseScore now receives pre-computed tokens
✅ Nil context handling added — TestTriggerHeuristicGrader_NilGradingContext covers both nil context and nil TestCase
✅ Empty prompt guard added — Returns clear feedback when prompt is empty
✅ Unique token normalization — Score calculation divides by unique token count to prevent dilution
✅ Comprehensive test coverage — Including duplicate token test verifying normalization works correctly
The implementation is clean, well-tested, and follows existing patterns. All tests pass.
Approval: ✅ Ready to merge
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…'s review - Delete local TriggerHeuristicGraderParams struct - NewTriggerHeuristicGrader takes models.TriggerHeuristicGraderParameters - Simplify Create() case to pass params directly - Update docs: skill_path accepts file or directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…details - Fix doc heading: ### → ## for consistent outline - Report configured skill_path (not resolved absolute) in grader details to keep results portable across machines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The resolved skillPath was stored in the struct but never referenced after construction. Only configuredPath is used (in Details output). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0c15e1d to
e6de679
Compare
Closes #80