feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18 by spboyer · Pull Request #79 · microsoft/waza

spboyer · 2026-03-04T23:38:26Z

Summary

Brings waza's scoring engine in line with spboyer/sensei v1.3.0. Adds 7 features across scoring and checks:

Changes

Issue	Feature	Type
Closes #72	`WHEN:` trigger pattern recognition	Scoring
Closes #73	`spec-security` check (XML tags, reserved name prefixes)	Spec compliance
Closes #74	`Invalid` score level for >1024 char descriptions	Scoring
Closes #75	Cross-model description density check (advisory 16) — word count and action verb lead	Advisory
Closes #76	Body structure quality check (advisory 17) — actionable instructions, examples, error handling	Advisory
Closes #77	Progressive disclosure check (advisory 18) — body line count and large code block detection	Advisory
Closes #78	Context-dependent anti-trigger risk assessment (severity scales by catalog size)	Scoring

Scope Notes

Advisory checks 16-18 implement the core sensei heuristics for each advisory category. Additional sub-checks (e.g., WHEN: vs USE FOR: preference analysis for advisory 16, inline reference-pattern detection for advisory 18) are deferred to follow-up work — this PR establishes the checker framework and the primary signals.

Files Changed

internal/checks/advisory_checks.go — New: CrossModelDensityChecker, BodyStructureChecker, ProgressiveDisclosureChecker
internal/checks/advisory_checks_test.go — Tests for all 3 advisory checkers
internal/checks/score_checkers.go — Register new checkers in pipeline
internal/checks/spec_checks.go — New: SpecSecurityChecker (checks reserved name prefixes even without FrontmatterRaw)
internal/checks/spec_checks_test.go — Tests for security checker (including nil FrontmatterRaw + reserved prefix)
internal/scoring/scoring.go — WHEN: trigger, Invalid level, context-dependent anti-triggers (severity scales by context), section-bounded counting
internal/scoring/scoring_test.go — Tests for all scoring changes
cmd/waza/dev/display_test.go — Updated golden strings for new checkers
cmd/waza/dev/loop_test.go — Updated golden strings for new checkers

All tests pass (go test ./... — 40 packages).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot

Pull request overview

Aligns waza’s scoring and compliance checks with sensei v1.3.0 by extending the heuristic scoring model and adding new spec/advisory checkers.

Changes:

Adds AdherenceInvalid and short-circuits scoring when description length exceeds 1024 characters; adds WHEN: to trigger detection and introduces catalog-size-based anti-trigger risk warnings.
Introduces SpecSecurityChecker and registers it in the spec checker pipeline.
Adds and registers advisory checkers for cross-model description density, body structure quality, and progressive disclosure, with corresponding tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
internal/scoring/scoring.go	Adds `Invalid` adherence, `WHEN:` trigger detection, and context-dependent anti-trigger risk logic.
internal/scoring/scoring_test.go	Adds/updates tests for `Invalid`, `WHEN:`, and anti-trigger risk behavior.
internal/checks/spec_checks.go	Adds `SpecSecurityChecker` implementation.
internal/checks/spec_checks_test.go	Adds tests for `SpecSecurityChecker`.
internal/checks/score_checkers.go	Registers the new spec-security and advisory checkers in the pipeline.
internal/checks/advisory_checks.go	Adds advisory checkers (density, body structure, progressive disclosure).
internal/checks/advisory_checks_test.go	Adds tests for the three new advisory checkers.

Comments suppressed due to low confidence (3)

internal/checks/advisory_checks.go:476

regexp.MustCompile is called inside BodyStructureChecker.Check, meaning the regex is recompiled on every check run. Since this runs per skill, consider hoisting this regex to a package-level var (similar to other patterns in this file) to avoid repeated compilation and keep the checker cheaper to run.

This issue also appears on line 544 of the same file.

	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

internal/checks/advisory_checks.go:546

codeBlockPattern := regexp.MustCompile(...) is created inside ProgressiveDisclosureChecker.Check, so it recompiles for every skill. Consider making it a package-level var so the regex is compiled once and reused across checks.

	// Count large code blocks (>50 lines)
	codeBlockPattern := regexp.MustCompile("(?s)```[^`]*```")
	blocks := codeBlockPattern.FindAllString(sk.RawContent, -1)

internal/checks/advisory_checks.go:543

ProgressiveDisclosureChecker counts lines and code blocks over sk.RawContent, which includes YAML frontmatter. The advisory definition is about the SKILL.md body, so this can over-count and incorrectly trigger warnings. Consider running these checks over sk.Body (or otherwise excluding the frontmatter block) so the thresholds apply to the body content only.

func (*ProgressiveDisclosureChecker) Check(sk skill.Skill) (*CheckResult, error) {
	lines := strings.Split(sk.RawContent, "\n")
	bodyLines := len(lines)

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

internal/checks/advisory_checks.go:477

BodyStructureChecker recompiles the numbered-steps regexp on every Check() call via regexp.MustCompile(...). Since this checker can run across many skills, consider moving this to a package-level precompiled regexp (e.g., var numberedStepsRE = regexp.MustCompile(...)) to avoid repeated compilation overhead.

	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

spboyer

✅ Reviewed by Rusty — LGTM (third review). All 7 sensei parity features (#72-#78) implemented correctly. Checkers registered, panic guard in place, Invalid adherence level, WHEN: triggers, context-dependent risk. 777 additions with comprehensive test coverage. Turk's fixes resolved both must-fix issues. CI green across all checks. Ship it. Ready to merge (can't self-approve since you authored this).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Richard Park <ripark@microsoft.com>

wbreza

Code Review: PR #79 - feat: sensei scoring parity

What Looks Good

All 8 prior Copilot review comments addressed - thorough iteration
Comprehensive test coverage - 487 lines of new tests with boundary cases
Clean architecture - all checkers follow ComplianceChecker interface
sectionForHeader() isolates sections correctly - solves trigger/anti-trigger overlap
Invalid adherence level short-circuit - well-designed
Context-dependent anti-trigger risk fires only when anti-triggers are present
skillBodyContent() prefers sk.Body over raw parsing with test verification

Suggestions (non-blocking)

sectionForHeader() repeated strings.ToUpper() - Consider pre-computing once. Minor perf.
errorHandlingPatterns includes error - Broad match. Consider heading-level patterns.

Summary

Priority	Count
Critical	0
High	0
Medium	2
Low	2

Overall Assessment: Approve - solid feature PR with comprehensive testing.

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing - countPhrasesAfterPattern now stops at all trigger headers (USE FOR:, WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections - Anti-trigger risk severity scales with context: High→error, Moderate→warning - SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw is nil (programmatically constructed skills) - ProgressiveDisclosureChecker trims trailing newline before line count to avoid off-by-one at the 500-line boundary - BodyStructureChecker error handling patterns narrowed from broad "error" substring to heading-aware patterns ("## error", "error handling", "## troubleshooting") to reduce false positives - Updated display and loop golden strings for new spec-security and advisory 16-18 checkers - All 40 test packages pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call) - Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid redundant uppercasing on each loop iteration - Fix boundary test name: 48 lines, not 50 Addresses wbreza review feedback on PR #79. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update golden strings: 'no action verbs' → 'no common lead words' - Clarify spec-security summary when FrontmatterRaw is nil - Add TODO for plumbing SkillCount into production callers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing - countPhrasesAfterPattern now stops at all trigger headers (USE FOR:, WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections - Anti-trigger risk severity scales with context: High→error, Moderate→warning - SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw is nil (programmatically constructed skills) - ProgressiveDisclosureChecker trims trailing newline before line count to avoid off-by-one at the 500-line boundary - BodyStructureChecker error handling patterns narrowed from broad "error" substring to heading-aware patterns ("## error", "error handling", "## troubleshooting") to reduce false positives - Updated display and loop golden strings for new spec-security and advisory 16-18 checkers - All 40 test packages pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call) - Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid redundant uppercasing on each loop iteration - Fix boundary test name: 48 lines, not 50 Addresses wbreza review feedback on PR #79. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update golden strings: 'no action verbs' → 'no common lead words' - Clarify spec-security summary when FrontmatterRaw is nil - Add TODO for plumbing SkillCount into production callers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

…level, advisory checks 16-18, context-dependent anti-triggers Implements seven features to align waza scoring with spboyer/sensei: - #72: Add 'WHEN:' trigger pattern recognition (cross-model optimized) - #73: Add spec-security checker (flags XML angle brackets and reserved prefixes) - #74: Add Invalid adherence level for descriptions >1024 chars (hard-fail) - #75: Add cross-model density checker (advisory 16: word count >60, action verb check) - #76: Add body structure quality checker (advisory 17: examples, code blocks, error handling) - #77: Add progressive disclosure checker (advisory 18: flags >500 line bodies, >50 line code blocks) - #78: Add context-dependent anti-trigger risk (Low: 1-5 skills, Moderate: 6-14, High: 15+) All changes include comprehensive test coverage. Closes #72, closes #73, closes #74, closes #75, closes #76, closes #77, closes #78 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing - countPhrasesAfterPattern now stops at all trigger headers (USE FOR:, WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections - Anti-trigger risk severity scales with context: High→error, Moderate→warning - SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw is nil (programmatically constructed skills) - ProgressiveDisclosureChecker trims trailing newline before line count to avoid off-by-one at the 500-line boundary - BodyStructureChecker error handling patterns narrowed from broad "error" substring to heading-aware patterns ("## error", "error handling", "## troubleshooting") to reduce false positives - Updated display and loop golden strings for new spec-security and advisory 16-18 checkers - All 40 test packages pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call) - Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid redundant uppercasing on each loop iteration - Fix boundary test name: 48 lines, not 50 Addresses wbreza review feedback on PR #79. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update golden strings: 'no action verbs' → 'no common lead words' - Clarify spec-security summary when FrontmatterRaw is nil - Add TODO for plumbing SkillCount into production callers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer

Both unresolved threads already addressed: (1) SkillCount — TODO added at scoring.go:116-117, plumbing is follow-up scope. (2) spec-security nil FrontmatterRaw — summary now says 'name prefix check only; raw frontmatter not available'.

Copilot AI review requested due to automatic review settings March 4, 2026 23:38

spboyer requested review from chlowell and richardpark-msft as code owners March 4, 2026 23:38

spboyer added the sensei-parity Parity with spboyer/sensei scoring label Mar 4, 2026

github-actions Bot enabled auto-merge (squash) March 4, 2026 23:38

Copilot started reviewing on behalf of spboyer March 4, 2026 23:39 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

spboyer added a commit that referenced this pull request Mar 4, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

fd138c1

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit that referenced this pull request Mar 4, 2026

docs(squad): log PR #79 feedback fix session

d838cc8

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

spboyer added a commit that referenced this pull request Mar 4, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

e80ddc7

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/sensei-parity branch from fd138c1 to e80ddc7 Compare March 4, 2026 23:59

spboyer added a commit that referenced this pull request Mar 5, 2026

docs(squad): log PR #79 feedback fix session

398180e

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

e9dc6f2

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 00:18

spboyer force-pushed the squad/sensei-parity branch from e80ddc7 to c30d412 Compare March 5, 2026 00:18

Copilot AI mentioned this pull request Mar 5, 2026

fix: update docs link to GitHub Pages URL #87

Merged

Copilot started reviewing on behalf of spboyer March 5, 2026 00:19 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread internal/scoring/scoring.go

Comment thread internal/scoring/scoring.go

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/sensei-parity branch from c30d412 to e8a1573 Compare March 5, 2026 16:00

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #79

da4aa80

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:32

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #79

ad8b723

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/sensei-parity branch from da4aa80 to ad8b723 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026

remove python tests folder as well (microsoft#79)

ad328f8

Co-authored-by: Richard Park <ripark@microsoft.com>

wbreza previously approved these changes Mar 5, 2026

View reviewed changes

Comment thread internal/scoring/scoring.go Outdated

Comment thread internal/checks/advisory_checks.go

spboyer dismissed wbreza’s stale review via 6bceac7 March 5, 2026 23:07

spboyer added a commit that referenced this pull request Mar 10, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

22b4a7e

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chlowell approved these changes Mar 10, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 10, 2026 23:22

spboyer added a commit that referenced this pull request Mar 10, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

49708be

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/sensei-parity branch from a218bdd to d84b276 Compare March 10, 2026 23:22

Copilot started reviewing on behalf of spboyer March 10, 2026 23:23 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/scoring/scoring.go

Comment thread internal/checks/advisory_checks.go

Comment thread internal/checks/advisory_checks.go

spboyer and others added 6 commits March 11, 2026 10:54

fix: address PR #79 review feedback — punctuation stripping, fence pa…

062365b

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor: Rename actionVerbs to commonLeadWords in advisory checks

f00631f

spboyer force-pushed the squad/sensei-parity branch from d84b276 to 0036d56 Compare March 11, 2026 17:54

spboyer commented Mar 11, 2026

View reviewed changes

github-actions Bot merged commit 6ff92e5 into main Mar 11, 2026
6 checks passed

spboyer mentioned this pull request Mar 12, 2026

Release v0.21.0 #122

Merged

4 tasks

Uh oh!

Conversation

spboyer commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Scope Notes

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #79 - feat: sensei scoring parity

What Looks Good

Suggestions (non-blocking)

Summary

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

spboyer commented Mar 4, 2026 •

edited

Loading