feat: Add trigger heuristic grader by spboyer · Pull Request #90 · microsoft/waza

spboyer · 2026-03-05T01:35:29Z

Closes #80

Copilot

Pull request overview

Adds a new trigger heuristic grader type to validate whether a prompt should (or should not) activate a skill, closing #80.

Changes:

Introduces trigger grader implementation + unit tests.
Wires trigger into grader type registries, creation factory, and JSON schemas.
Adds end-user documentation for configuring and interpreting trigger grading output.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
schemas/task.schema.json	Adds `trigger` grader type and `triggerGraderConfig` definition for task schema validation.
schemas/eval.schema.json	Adds `trigger` grader type and documented `triggerGraderConfig` definition for eval schema validation.
internal/suggest/suggest.go	Includes `trigger` in the set of grader types surfaced by suggest.
internal/suggest/grader_docs.go	Adds a short description for the `trigger` grader in suggest docs.
internal/models/outcome.go	Defines `GraderKindTrigger` constant for the new grader kind.
internal/graders/trigger_grader.go	Implements the trigger heuristic grader (keyword + phrase scoring).
internal/graders/trigger_grader_test.go	Adds tests for constructor validation, mode behavior, thresholds, and factory creation.
internal/graders/grader.go	Extends `Create` factory to construct the trigger grader from config.
docs/graders/trigger.md	Documents `trigger` grader configuration, scoring behavior, and output details.
docs/graders/README.md	Links the new `trigger` grader docs from the graders index.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md	Adds session log entry (process documentation).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md	Adds session log entry (process documentation).
.squad/decisions.md	Records decision notes (process documentation).
.squad/agents/linus/history.md	Records learnings summary for #80 (process documentation).

Comments suppressed due to low confidence (4)

internal/graders/trigger_grader.go:163

bestPhraseScore re-tokenizes the prompt even though scorePrompt already computed promptTokens. Consider passing promptTokens (and/or a precomputed token set + lowercased prompt) into bestPhraseScore to avoid duplicate tokenization and allocation on every grade call.

func (g *triggerHeuristicGrader) scorePrompt(prompt string) (score float64, phraseScore float64, matched []string) {
	promptTokens := tokenize(prompt)
	if len(promptTokens) == 0 {
		return 0, 0, nil
	}

	seen := make(map[string]bool)
	for _, token := range promptTokens {
		if _, ok := g.keywords[token]; ok && !seen[token] {
			seen[token] = true
			matched = append(matched, token)
		}
	}

	tokenScore := float64(len(matched)) / float64(len(promptTokens))
	phraseScore = g.bestPhraseScore(prompt)
	if phraseScore > tokenScore {
		return phraseScore, phraseScore, matched
	}
	return tokenScore, phraseScore, matched
}

func (g *triggerHeuristicGrader) bestPhraseScore(prompt string) float64 {
	if len(g.triggerPhrases) == 0 {
		return 0
	}

	promptLower := strings.ToLower(prompt)
	promptTokenSet := make(map[string]struct{})
	for _, token := range tokenize(prompt) {
		promptTokenSet[token] = struct{}{}
	}

internal/graders/trigger_grader.go:203

The error messages hardcode SKILL.md, but skillPath may be any filename/path (even though docs recommend SKILL.md). Consider changing these messages to reference “skill file” (or include the provided path without assuming the basename) to avoid misleading users during debugging.

func loadTriggerHeuristicData(skillPath string) (map[string]struct{}, []string, error) {
	data, err := os.ReadFile(skillPath)
	if err != nil {
		return nil, nil, fmt.Errorf("reading SKILL.md %s: %w", skillPath, err)
	}

	var sk skill.Skill
	if err := sk.UnmarshalText(data); err != nil {
		return nil, nil, fmt.Errorf("parsing SKILL.md %s: %w", skillPath, err)
	}

schemas/task.schema.json:868

triggerGraderConfig in task.schema.json is missing the description fields that were added in eval.schema.json (for the object and its properties). Adding the same descriptions here would keep the schemas consistent and improve generated docs / editor intellisense for task configs.

    "triggerGraderConfig": {
      "type": "object",
      "required": [
        "skill_path",
        "mode"
      ],
      "additionalProperties": false,
      "properties": {
        "skill_path": {
          "type": "string",
          "minLength": 1
        },
        "mode": {
          "type": "string",
          "enum": [
            "positive",
            "negative"
          ]
        },
        "threshold": {
          "type": "number",
          "minimum": 0,
          "maximum": 1,
          "default": 0.6
        }
      }
    },

internal/graders/trigger_grader_test.go:141

Prefer require.GreaterOrEqual(t, result.Score, threshold) for clearer failure output and consistency with other assertions in this test file.

	require.True(t, result.Score >= threshold)
	require.True(t, result.Passed)

chlowell · 2026-03-05T02:21:21Z

I like the consistency of having everything be a grader instead of having graders and trigger tests defined separately as we do today (for example). Do you want to remove the existing trigger test feature in this PR?

spboyer

Rusty (Opus 4.6) — ⚠️ Blocked: Merge Conflict

The trigger heuristic grader implementation is solid architecturally — clean grader pattern, constructor validation, 4 test functions, \docs/graders/trigger.md, schema updates. I want to approve this.

Blocker: Merge conflict. GitHub reports \mergeable: CONFLICTING. Only 1 CI check (CLA) ran — the full Go CI matrix didn't execute.

Action needed:

Rebase onto main to resolve the merge conflict (likely in \internal/models/outcome.go\ where fields overlap with PR #91)
Verify all 7 CI checks pass
Push the rebased branch

Once CI is green, I'll approve immediately.

codecov-commenter · 2026-03-05T15:21:56Z

Codecov Report

❌ Patch coverage is 82.92683% with 28 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@9b9084c). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/graders/trigger_grader.go	84.17%	15 Missing and 10 partials ⚠️
internal/models/grader_params.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #90   +/-   ##
=======================================
  Coverage        ?   73.37%           
=======================================
  Files           ?      132           
  Lines           ?    15231           
  Branches        ?        0           
=======================================
  Hits            ?    11176           
  Misses          ?     3238           
  Partials        ?      817

Flag	Coverage Δ
go-implementation	`73.37% <82.92%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

wbreza

Code Review: PR #90 - feat: Add trigger heuristic grader

What Looks Good

Follows existing grader patterns
Clean mode separation with good tests
Smart keyword extraction strips DO NOT USE FOR content
Dual scoring approach with good coverage
Thorough constructor validation and docs included

Suggestions (non-blocking)

Stop words include trigger verbs (use, when, help). Consider excluding from stop words.

Summary

Priority	Count
Critical	0
High	0
Medium	1
Low	2

Overall Assessment: Approve

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

wbreza

Code Review: PR #90 — feat: Add trigger heuristic grader

✅ What Looks Good

Follows existing grader patterns — factory registration, measureTime wrapper, Grader interface, consistent details map
Clean scoring design — max(tokenScore, phraseScore) gives two complementary signals
Proper anti-trigger exclusion — strips DO NOT USE FOR: before keyword extraction
Thoughtful tokenizer — stop words, min length, unicode-aware
Constructor validation — clear error messages for mode, threshold, skill_path
Schema, docs, and suggest registration all updated consistently

Findings Summary

Priority	Count
Critical	0
High	0
Medium	0
Low	2
Total	2

Overall Assessment: Approve — clean implementation. Low findings are minor observations only.

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR #90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-09T22:07:06Z

Addressed wbreza review feedback in feb26ef:

Removed 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring
Eliminated double tokenization by passing pre-computed promptTokens from scorePrompt to bestPhraseScore
Added TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR #90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (7)

internal/mcp/server_test.go:119

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:154

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:193

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:244

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:272

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:28

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:218

The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.

	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

spboyer

Re: return after t.Fatal — these are intentional. staticcheck SA5011 doesn't recognize t.Fatal as a terminating call and flags the subsequent resp.Error access as a possible nil pointer dereference. The return is the standard workaround to satisfy the linter.

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

You can also share your feedback on Copilot code review. Take the survey.

wbreza

Re-Review: PR #90 — feat: Add trigger heuristic grader

All previous review feedback has been properly addressed:

✅ Stop words updated — Removed 'when', 'use', 'help' to preserve trigger verbs
✅ Double tokenization eliminated — �estPhraseScore now receives pre-computed tokens
✅ Nil context handling added — TestTriggerHeuristicGrader_NilGradingContext covers both nil context and nil TestCase
✅ Empty prompt guard added — Returns clear feedback when prompt is empty
✅ Unique token normalization — Score calculation divides by unique token count to prevent dilution
✅ Comprehensive test coverage — Including duplicate token test verifying normalization works correctly

The implementation is clean, well-tested, and follows existing patterns. All tests pass.

Approval: ✅ Ready to merge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ion, add nil context test - Remove 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring - Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to avoid redundant tokenize() call - Add TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths Addresses wbreza review feedback on PR microsoft#90. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Normalize scorePrompt by unique token count so repeated words don't artificially dilute the relevance score - Accept both file and directory paths for skill_path (appends SKILL.md when given a directory) - Add tests for both fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…'s review - Delete local TriggerHeuristicGraderParams struct - NewTriggerHeuristicGrader takes models.TriggerHeuristicGraderParameters - Simplify Create() case to pass params directly - Update docs: skill_path accepts file or directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…details - Fix doc heading: ### → ## for consistent outline - Report configured skill_path (not resolved absolute) in grader details to keep results portable across machines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The resolved skillPath was stored in the struct but never referenced after construction. Only configuredPath is used (in Details output). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer requested a review from chlowell as a code owner March 5, 2026 01:35

Copilot AI review requested due to automatic review settings March 5, 2026 01:35

spboyer requested a review from richardpark-msft as a code owner March 5, 2026 01:35

spboyer self-assigned this Mar 5, 2026

github-actions Bot enabled auto-merge (squash) March 5, 2026 01:35

Copilot AI reviewed Mar 5, 2026

View reviewed changes

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 7d59e0e to 671e8a9 Compare March 5, 2026 15:18

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 671e8a9 to 9fec0b1 Compare March 5, 2026 17:12

Copilot AI review requested due to automatic review settings March 5, 2026 17:12

Copilot AI reviewed Mar 5, 2026

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 9fec0b1 to 3f422a9 Compare March 5, 2026 17:46

chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026

feat: create waza SKILL.md (microsoft#52) (microsoft#90)

1e995e3

wbreza approved these changes Mar 5, 2026

View reviewed changes

Comment thread internal/graders/trigger_grader.go

Copilot AI review requested due to automatic review settings March 6, 2026 00:04

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 3f422a9 to e357ce2 Compare March 6, 2026 00:04

Copilot started reviewing on behalf of spboyer March 6, 2026 00:06 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Comment thread internal/graders/trigger_grader.go

Comment thread internal/suggest/suggest.go

Comment thread docs/graders/trigger.md Outdated

wbreza previously approved these changes Mar 6, 2026

View reviewed changes

Comment thread internal/graders/trigger_grader_test.go

Comment thread internal/graders/trigger_grader.go

spboyer dismissed wbreza’s stale review via 6928ee3 March 9, 2026 22:24

Copilot AI review requested due to automatic review settings March 10, 2026 16:13

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 6928ee3 to ef98d65 Compare March 10, 2026 16:13

Copilot started reviewing on behalf of spboyer March 10, 2026 16:14 View session

Copilot AI review requested due to automatic review settings March 11, 2026 14:47

Copilot started reviewing on behalf of spboyer March 11, 2026 14:48 View session

spboyer requested a review from richardpark-msft March 11, 2026 14:52

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 4ebcdb8 to 9bbce55 Compare March 11, 2026 14:53

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread internal/mcp/server_test.go

Comment thread docs/graders/trigger.md Outdated

Comment thread internal/graders/trigger_grader.go

spboyer commented Mar 11, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 11, 2026 18:02

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from d83dd62 to f8e0861 Compare March 11, 2026 18:02

Copilot started reviewing on behalf of spboyer March 11, 2026 18:02 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread internal/graders/trigger_grader.go

wbreza approved these changes Mar 11, 2026

View reviewed changes

spboyer and others added 8 commits March 11, 2026 13:26

feat: add trigger heuristic grader microsoft#80

1d1e114

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: Align trigger grader heading style

0649195

fix: add return after t.Fatal to satisfy staticcheck SA5011

ae2ee39

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor: remove unused skillPath field from triggerHeuristicGrader

e6de679

The resolved skillPath was stored in the struct but never referenced after construction. Only configuredPath is used (in Details output). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 11, 2026 20:26

spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 0c15e1d to e6de679 Compare March 11, 2026 20:26

Copilot started reviewing on behalf of spboyer March 11, 2026 20:27 View session

github-actions Bot merged commit d1b8c25 into microsoft:main Mar 11, 2026
8 checks passed

spboyer mentioned this pull request Mar 12, 2026

Release v0.21.0 #122

Merged

4 tasks

spboyer removed the request for review from Copilot March 23, 2026 19:24

spboyer mentioned this pull request Apr 22, 2026

feat: Add trigger heuristic grader #80

Closed

4 tasks

Uh oh!

Conversation

spboyer commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chlowell commented Mar 5, 2026

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #90 - feat: Add trigger heuristic grader

What Looks Good

Suggestions (non-blocking)

Summary

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #90 — feat: Add trigger heuristic grader

✅ What Looks Good

Findings Summary

Uh oh!

Uh oh!

Uh oh!

spboyer commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Re-Review: PR #90 — feat: Add trigger heuristic grader

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Mar 5, 2026 •

edited

Loading