Skip to content

feat: Add trigger heuristic grader#90

Merged
github-actions[bot] merged 8 commits into
microsoft:mainfrom
spboyer:squad/80-trigger-heuristic-grader
Mar 11, 2026
Merged

feat: Add trigger heuristic grader#90
github-actions[bot] merged 8 commits into
microsoft:mainfrom
spboyer:squad/80-trigger-heuristic-grader

Conversation

@spboyer

@spboyer spboyer commented Mar 5, 2026

Copy link
Copy Markdown
Member

Closes #80

@spboyer spboyer requested a review from chlowell as a code owner March 5, 2026 01:35
Copilot AI review requested due to automatic review settings March 5, 2026 01:35
@spboyer spboyer self-assigned this Mar 5, 2026
@github-actions github-actions Bot enabled auto-merge (squash) March 5, 2026 01:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new trigger heuristic grader type to validate whether a prompt should (or should not) activate a skill, closing #80.

Changes:

  • Introduces trigger grader implementation + unit tests.
  • Wires trigger into grader type registries, creation factory, and JSON schemas.
  • Adds end-user documentation for configuring and interpreting trigger grading output.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
schemas/task.schema.json Adds trigger grader type and triggerGraderConfig definition for task schema validation.
schemas/eval.schema.json Adds trigger grader type and documented triggerGraderConfig definition for eval schema validation.
internal/suggest/suggest.go Includes trigger in the set of grader types surfaced by suggest.
internal/suggest/grader_docs.go Adds a short description for the trigger grader in suggest docs.
internal/models/outcome.go Defines GraderKindTrigger constant for the new grader kind.
internal/graders/trigger_grader.go Implements the trigger heuristic grader (keyword + phrase scoring).
internal/graders/trigger_grader_test.go Adds tests for constructor validation, mode behavior, thresholds, and factory creation.
internal/graders/grader.go Extends Create factory to construct the trigger grader from config.
docs/graders/trigger.md Documents trigger grader configuration, scoring behavior, and output details.
docs/graders/README.md Links the new trigger grader docs from the graders index.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md Adds session log entry (process documentation).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md Adds session log entry (process documentation).
.squad/decisions.md Records decision notes (process documentation).
.squad/agents/linus/history.md Records learnings summary for #80 (process documentation).
Comments suppressed due to low confidence (4)

internal/graders/trigger_grader.go:163

  • bestPhraseScore re-tokenizes the prompt even though scorePrompt already computed promptTokens. Consider passing promptTokens (and/or a precomputed token set + lowercased prompt) into bestPhraseScore to avoid duplicate tokenization and allocation on every grade call.
func (g *triggerHeuristicGrader) scorePrompt(prompt string) (score float64, phraseScore float64, matched []string) {
	promptTokens := tokenize(prompt)
	if len(promptTokens) == 0 {
		return 0, 0, nil
	}

	seen := make(map[string]bool)
	for _, token := range promptTokens {
		if _, ok := g.keywords[token]; ok && !seen[token] {
			seen[token] = true
			matched = append(matched, token)
		}
	}

	tokenScore := float64(len(matched)) / float64(len(promptTokens))
	phraseScore = g.bestPhraseScore(prompt)
	if phraseScore > tokenScore {
		return phraseScore, phraseScore, matched
	}
	return tokenScore, phraseScore, matched
}

func (g *triggerHeuristicGrader) bestPhraseScore(prompt string) float64 {
	if len(g.triggerPhrases) == 0 {
		return 0
	}

	promptLower := strings.ToLower(prompt)
	promptTokenSet := make(map[string]struct{})
	for _, token := range tokenize(prompt) {
		promptTokenSet[token] = struct{}{}
	}

internal/graders/trigger_grader.go:203

  • The error messages hardcode SKILL.md, but skillPath may be any filename/path (even though docs recommend SKILL.md). Consider changing these messages to reference “skill file” (or include the provided path without assuming the basename) to avoid misleading users during debugging.
func loadTriggerHeuristicData(skillPath string) (map[string]struct{}, []string, error) {
	data, err := os.ReadFile(skillPath)
	if err != nil {
		return nil, nil, fmt.Errorf("reading SKILL.md %s: %w", skillPath, err)
	}

	var sk skill.Skill
	if err := sk.UnmarshalText(data); err != nil {
		return nil, nil, fmt.Errorf("parsing SKILL.md %s: %w", skillPath, err)
	}

schemas/task.schema.json:868

  • triggerGraderConfig in task.schema.json is missing the description fields that were added in eval.schema.json (for the object and its properties). Adding the same descriptions here would keep the schemas consistent and improve generated docs / editor intellisense for task configs.
    "triggerGraderConfig": {
      "type": "object",
      "required": [
        "skill_path",
        "mode"
      ],
      "additionalProperties": false,
      "properties": {
        "skill_path": {
          "type": "string",
          "minLength": 1
        },
        "mode": {
          "type": "string",
          "enum": [
            "positive",
            "negative"
          ]
        },
        "threshold": {
          "type": "number",
          "minimum": 0,
          "maximum": 1,
          "default": 0.6
        }
      }
    },

internal/graders/trigger_grader_test.go:141

  • Prefer require.GreaterOrEqual(t, result.Score, threshold) for clearer failure output and consistency with other assertions in this test file.
	require.True(t, result.Score >= threshold)
	require.True(t, result.Passed)

@chlowell

chlowell commented Mar 5, 2026

Copy link
Copy Markdown
Member

I like the consistency of having everything be a grader instead of having graders and trigger tests defined separately as we do today (for example). Do you want to remove the existing trigger test feature in this PR?

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rusty (Opus 4.6) — ⚠️ Blocked: Merge Conflict

The trigger heuristic grader implementation is solid architecturally — clean grader pattern, constructor validation, 4 test functions, \docs/graders/trigger.md, schema updates. I want to approve this.

Blocker: Merge conflict. GitHub reports \mergeable: CONFLICTING. Only 1 CI check (CLA) ran — the full Go CI matrix didn't execute.

Action needed:

  1. Rebase onto main to resolve the merge conflict (likely in \internal/models/outcome.go\ where fields overlap with PR #91)
  2. Verify all 7 CI checks pass
  3. Push the rebased branch

Once CI is green, I'll approve immediately.

@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 7d59e0e to 671e8a9 Compare March 5, 2026 15:18
@codecov-commenter

codecov-commenter commented Mar 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.92683% with 28 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@9b9084c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
internal/graders/trigger_grader.go 84.17% 15 Missing and 10 partials ⚠️
internal/models/grader_params.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #90   +/-   ##
=======================================
  Coverage        ?   73.37%           
=======================================
  Files           ?      132           
  Lines           ?    15231           
  Branches        ?        0           
=======================================
  Hits            ?    11176           
  Misses          ?     3238           
  Partials        ?      817           
Flag Coverage Δ
go-implementation 73.37% <82.92%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 671e8a9 to 9fec0b1 Compare March 5, 2026 17:12
Copilot AI review requested due to automatic review settings March 5, 2026 17:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 9fec0b1 to 3f422a9 Compare March 5, 2026 17:46
chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026

@wbreza wbreza left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #90 - feat: Add trigger heuristic grader

What Looks Good

  • Follows existing grader patterns
  • Clean mode separation with good tests
  • Smart keyword extraction strips DO NOT USE FOR content
  • Dual scoring approach with good coverage
  • Thorough constructor validation and docs included

Suggestions (non-blocking)

  1. Stop words include trigger verbs (use, when, help). Consider excluding from stop words.

Summary

Priority Count
Critical 0
High 0
Medium 1
Low 2

Overall Assessment: Approve

Comment thread internal/graders/trigger_grader.go
Copilot AI review requested due to automatic review settings March 6, 2026 00:04
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 3f422a9 to e357ce2 Compare March 6, 2026 00:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment thread internal/graders/trigger_grader.go
Comment thread internal/suggest/suggest.go
Comment thread docs/graders/trigger.md Outdated
wbreza
wbreza previously approved these changes Mar 6, 2026

@wbreza wbreza left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #90 — feat: Add trigger heuristic grader

✅ What Looks Good

  • Follows existing grader patterns — factory registration, measureTime wrapper, Grader interface, consistent details map
  • Clean scoring designmax(tokenScore, phraseScore) gives two complementary signals
  • Proper anti-trigger exclusion — strips DO NOT USE FOR: before keyword extraction
  • Thoughtful tokenizer — stop words, min length, unicode-aware
  • Constructor validation — clear error messages for mode, threshold, skill_path
  • Schema, docs, and suggest registration all updated consistently

Findings Summary

Priority Count
Critical 0
High 0
Medium 0
Low 2
Total 2

Overall Assessment: Approve — clean implementation. Low findings are minor observations only.

Comment thread internal/graders/trigger_grader_test.go
Comment thread internal/graders/trigger_grader.go
spboyer added a commit that referenced this pull request Mar 9, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR #90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer

spboyer commented Mar 9, 2026

Copy link
Copy Markdown
Member Author

Addressed wbreza review feedback in feb26ef:

  • Removed 'when', 'use', 'help' from stop words — these are common trigger verbs that should contribute to keyword scoring
  • Eliminated double tokenization by passing pre-computed promptTokens from scorePrompt to bestPhraseScore
  • Added TestTriggerHeuristicGrader_NilGradingContext covering nil context and nil TestCase paths

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 9, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR microsoft#90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR #90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 10, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR microsoft#90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 16:13
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 6928ee3 to ef98d65 Compare March 10, 2026 16:13
Copilot AI review requested due to automatic review settings March 11, 2026 14:47
spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 11, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR microsoft#90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer pushed a commit to spboyer/waza-fk that referenced this pull request Mar 11, 2026
- Normalize scorePrompt by unique token count so repeated words
  don't artificially dilute the relevance score
- Accept both file and directory paths for skill_path (appends
  SKILL.md when given a directory)
- Add tests for both fixes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 4ebcdb8 to 9bbce55 Compare March 11, 2026 14:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (7)

internal/mcp/server_test.go:119

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:154

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:193

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:244

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:272

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:28

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

internal/mcp/server_test.go:218

  • The return after t.Fatal(...) is redundant—t.Fatal already stops the test goroutine. Removing the extra return keeps the test code cleaner and avoids implying t.Fatal might fall through.
	if resp == nil {
		t.Fatal("expected response, got nil")
		return
	}

Comment thread internal/mcp/server_test.go
Comment thread docs/graders/trigger.md Outdated
Comment thread internal/graders/trigger_grader.go

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: return after t.Fatal — these are intentional. staticcheck SA5011 doesn't recognize t.Fatal as a terminating call and flags the subsequent resp.Error access as a possible nil pointer dereference. The return is the standard workaround to satisfy the linter.

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 11, 2026
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR microsoft#90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer pushed a commit to spboyer/waza-fk that referenced this pull request Mar 11, 2026
- Normalize scorePrompt by unique token count so repeated words
  don't artificially dilute the relevance score
- Accept both file and directory paths for skill_path (appends
  SKILL.md when given a directory)
- Add tests for both fixes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 18:02
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from d83dd62 to f8e0861 Compare March 11, 2026 18:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread internal/graders/trigger_grader.go

@wbreza wbreza left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: PR #90 — feat: Add trigger heuristic grader

All previous review feedback has been properly addressed:

Stop words updated — Removed 'when', 'use', 'help' to preserve trigger verbs
Double tokenization eliminated — �estPhraseScore now receives pre-computed tokens
Nil context handling added — TestTriggerHeuristicGrader_NilGradingContext covers both nil context and nil TestCase
Empty prompt guard added — Returns clear feedback when prompt is empty
Unique token normalization — Score calculation divides by unique token count to prevent dilution
Comprehensive test coverage — Including duplicate token test verifying normalization works correctly

The implementation is clean, well-tested, and follows existing patterns. All tests pass.

Approval: ✅ Ready to merge

spboyer and others added 8 commits March 11, 2026 13:26
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion, add nil context test

- Remove 'when', 'use', 'help' from stop words — these are common trigger
  verbs that should contribute to keyword scoring
- Pass pre-computed promptTokens from scorePrompt to bestPhraseScore to
  avoid redundant tokenize() call
- Add TestTriggerHeuristicGrader_NilGradingContext covering nil context
  and nil TestCase paths

Addresses wbreza review feedback on PR microsoft#90.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Normalize scorePrompt by unique token count so repeated words
  don't artificially dilute the relevance score
- Accept both file and directory paths for skill_path (appends
  SKILL.md when given a directory)
- Add tests for both fixes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…'s review

- Delete local TriggerHeuristicGraderParams struct
- NewTriggerHeuristicGrader takes models.TriggerHeuristicGraderParameters
- Simplify Create() case to pass params directly
- Update docs: skill_path accepts file or directory

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…details

- Fix doc heading: ### → ## for consistent outline
- Report configured skill_path (not resolved absolute) in grader
  details to keep results portable across machines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The resolved skillPath was stored in the struct but never referenced
after construction. Only configuredPath is used (in Details output).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 20:26
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 0c15e1d to e6de679 Compare March 11, 2026 20:26
@github-actions github-actions Bot merged commit d1b8c25 into microsoft:main Mar 11, 2026
8 checks passed
@spboyer spboyer mentioned this pull request Mar 12, 2026
4 tasks
@spboyer spboyer removed the request for review from Copilot March 23, 2026 19:24
@spboyer spboyer mentioned this pull request Apr 22, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add trigger heuristic grader

7 participants