Skip to content

feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79

Merged
github-actions[bot] merged 6 commits into
mainfrom
squad/sensei-parity
Mar 11, 2026
Merged

feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79
github-actions[bot] merged 6 commits into
mainfrom
squad/sensei-parity

Conversation

@spboyer

@spboyer spboyer commented Mar 4, 2026

Copy link
Copy Markdown
Member

Summary

Brings waza's scoring engine in line with spboyer/sensei v1.3.0. Adds 7 features across scoring and checks:

Changes

Issue Feature Type
Closes #72 WHEN: trigger pattern recognition Scoring
Closes #73 spec-security check (XML tags, reserved name prefixes) Spec compliance
Closes #74 Invalid score level for >1024 char descriptions Scoring
Closes #75 Cross-model description density check (advisory 16) — word count and action verb lead Advisory
Closes #76 Body structure quality check (advisory 17) — actionable instructions, examples, error handling Advisory
Closes #77 Progressive disclosure check (advisory 18) — body line count and large code block detection Advisory
Closes #78 Context-dependent anti-trigger risk assessment (severity scales by catalog size) Scoring

Scope Notes

Advisory checks 16-18 implement the core sensei heuristics for each advisory category. Additional sub-checks (e.g., WHEN: vs USE FOR: preference analysis for advisory 16, inline reference-pattern detection for advisory 18) are deferred to follow-up work — this PR establishes the checker framework and the primary signals.

Files Changed

  • internal/checks/advisory_checks.go — New: CrossModelDensityChecker, BodyStructureChecker, ProgressiveDisclosureChecker
  • internal/checks/advisory_checks_test.go — Tests for all 3 advisory checkers
  • internal/checks/score_checkers.go — Register new checkers in pipeline
  • internal/checks/spec_checks.go — New: SpecSecurityChecker (checks reserved name prefixes even without FrontmatterRaw)
  • internal/checks/spec_checks_test.go — Tests for security checker (including nil FrontmatterRaw + reserved prefix)
  • internal/scoring/scoring.go — WHEN: trigger, Invalid level, context-dependent anti-triggers (severity scales by context), section-bounded counting
  • internal/scoring/scoring_test.go — Tests for all scoring changes
  • cmd/waza/dev/display_test.go — Updated golden strings for new checkers
  • cmd/waza/dev/loop_test.go — Updated golden strings for new checkers

All tests pass (go test ./... — 40 packages).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot AI review requested due to automatic review settings March 4, 2026 23:38
@spboyer spboyer added the sensei-parity Parity with spboyer/sensei scoring label Mar 4, 2026
@github-actions github-actions Bot enabled auto-merge (squash) March 4, 2026 23:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns waza’s scoring and compliance checks with sensei v1.3.0 by extending the heuristic scoring model and adding new spec/advisory checkers.

Changes:

  • Adds AdherenceInvalid and short-circuits scoring when description length exceeds 1024 characters; adds WHEN: to trigger detection and introduces catalog-size-based anti-trigger risk warnings.
  • Introduces SpecSecurityChecker and registers it in the spec checker pipeline.
  • Adds and registers advisory checkers for cross-model description density, body structure quality, and progressive disclosure, with corresponding tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
internal/scoring/scoring.go Adds Invalid adherence, WHEN: trigger detection, and context-dependent anti-trigger risk logic.
internal/scoring/scoring_test.go Adds/updates tests for Invalid, WHEN:, and anti-trigger risk behavior.
internal/checks/spec_checks.go Adds SpecSecurityChecker implementation.
internal/checks/spec_checks_test.go Adds tests for SpecSecurityChecker.
internal/checks/score_checkers.go Registers the new spec-security and advisory checkers in the pipeline.
internal/checks/advisory_checks.go Adds advisory checkers (density, body structure, progressive disclosure).
internal/checks/advisory_checks_test.go Adds tests for the three new advisory checkers.
Comments suppressed due to low confidence (3)

internal/checks/advisory_checks.go:476

  • regexp.MustCompile is called inside BodyStructureChecker.Check, meaning the regex is recompiled on every check run. Since this runs per skill, consider hoisting this regex to a package-level var (similar to other patterns in this file) to avoid repeated compilation and keep the checker cheaper to run.

This issue also appears on line 544 of the same file.

	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

internal/checks/advisory_checks.go:546

  • codeBlockPattern := regexp.MustCompile(...) is created inside ProgressiveDisclosureChecker.Check, so it recompiles for every skill. Consider making it a package-level var so the regex is compiled once and reused across checks.
	// Count large code blocks (>50 lines)
	codeBlockPattern := regexp.MustCompile("(?s)```[^`]*```")
	blocks := codeBlockPattern.FindAllString(sk.RawContent, -1)

internal/checks/advisory_checks.go:543

  • ProgressiveDisclosureChecker counts lines and code blocks over sk.RawContent, which includes YAML frontmatter. The advisory definition is about the SKILL.md body, so this can over-count and incorrectly trigger warnings. Consider running these checks over sk.Body (or otherwise excluding the frontmatter block) so the thresholds apply to the body content only.
func (*ProgressiveDisclosureChecker) Check(sk skill.Skill) (*CheckResult, error) {
	lines := strings.Split(sk.RawContent, "\n")
	bodyLines := len(lines)

Comment thread internal/checks/advisory_checks.go
Comment thread internal/checks/advisory_checks.go Outdated
Comment thread internal/checks/spec_checks.go Outdated
Comment thread internal/scoring/scoring.go Outdated
Comment thread internal/checks/advisory_checks.go
Comment thread internal/checks/advisory_checks.go Outdated
spboyer added a commit that referenced this pull request Mar 4, 2026
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 4, 2026
Orchestration logs:
- Linus: Implementation of 7 sensei scoring parity features
- Rusty (review): Initial code review — 2 must-fix issues identified
- Turk: Fixes applied — checker registration, panic guard
- Rusty (re-review): Approval — all issues resolved

Session log:
- /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md
- Gap analysis, 7 issues created (#72-#78)
- Implementation, rejection, fix, and approval workflow
- PR #79 ready for merge

Decision merges from inbox:
- User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6
- Sensei parity code review (initial + re-review)
- Inbox files cleaned

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 4, 2026
- Orchestration log: Linus fixed all 6 PR review comments
- Session log: All threads resolved, tests passing
- Model: gpt-5.3-codex on sync execution
spboyer added a commit that referenced this pull request Mar 4, 2026
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/sensei-parity branch from fd138c1 to e80ddc7 Compare March 4, 2026 23:59
spboyer added a commit that referenced this pull request Mar 5, 2026
Orchestration logs:
- Linus: Implementation of 7 sensei scoring parity features
- Rusty (review): Initial code review — 2 must-fix issues identified
- Turk: Fixes applied — checker registration, panic guard
- Rusty (re-review): Approval — all issues resolved

Session log:
- /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md
- Gap analysis, 7 issues created (#72-#78)
- Implementation, rejection, fix, and approval workflow
- PR #79 ready for merge

Decision merges from inbox:
- User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6
- Sensei parity code review (initial + re-review)
- Inbox files cleaned

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 5, 2026
- Orchestration log: Linus fixed all 6 PR review comments
- Session log: All threads resolved, tests passing
- Model: gpt-5.3-codex on sync execution
spboyer added a commit that referenced this pull request Mar 5, 2026
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 5, 2026 00:18
@spboyer spboyer force-pushed the squad/sensei-parity branch from e80ddc7 to c30d412 Compare March 5, 2026 00:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

internal/checks/advisory_checks.go:477

  • BodyStructureChecker recompiles the numbered-steps regexp on every Check() call via regexp.MustCompile(...). Since this checker can run across many skills, consider moving this to a package-level precompiled regexp (e.g., var numberedStepsRE = regexp.MustCompile(...)) to avoid repeated compilation overhead.
	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

Comment thread internal/scoring/scoring.go
Comment thread internal/scoring/scoring.go

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Reviewed by Rusty — LGTM (third review). All 7 sensei parity features (#72-#78) implemented correctly. Checkers registered, panic guard in place, Invalid adherence level, WHEN: triggers, context-dependent risk. 777 additions with comprehensive test coverage. Turk's fixes resolved both must-fix issues. CI green across all checks. Ship it. Ready to merge (can't self-approve since you authored this).

@spboyer spboyer force-pushed the squad/sensei-parity branch from c30d412 to e8a1573 Compare March 5, 2026 16:00
spboyer added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 5, 2026 17:32
spboyer added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/sensei-parity branch from da4aa80 to ad8b723 Compare March 5, 2026 17:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026
Co-authored-by: Richard Park <ripark@microsoft.com>
wbreza
wbreza previously approved these changes Mar 5, 2026

@wbreza wbreza left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #79 - feat: sensei scoring parity

What Looks Good

  • All 8 prior Copilot review comments addressed - thorough iteration
  • Comprehensive test coverage - 487 lines of new tests with boundary cases
  • Clean architecture - all checkers follow ComplianceChecker interface
  • sectionForHeader() isolates sections correctly - solves trigger/anti-trigger overlap
  • Invalid adherence level short-circuit - well-designed
  • Context-dependent anti-trigger risk fires only when anti-triggers are present
  • skillBodyContent() prefers sk.Body over raw parsing with test verification

Suggestions (non-blocking)

  1. sectionForHeader() repeated strings.ToUpper() - Consider pre-computing once. Minor perf.
  2. errorHandlingPatterns includes error - Broad match. Consider heading-level patterns.

Summary

Priority Count
Critical 0
High 0
Medium 2
Low 2

Overall Assessment: Approve - solid feature PR with comprehensive testing.

Comment thread internal/scoring/scoring.go Outdated
Comment thread internal/checks/advisory_checks.go
spboyer added a commit that referenced this pull request Mar 10, 2026
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing

- countPhrasesAfterPattern now stops at all trigger headers (USE FOR:,
  WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections
- Anti-trigger risk severity scales with context: High→error, Moderate→warning
- SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw
  is nil (programmatically constructed skills)
- ProgressiveDisclosureChecker trims trailing newline before line count to
  avoid off-by-one at the 500-line boundary
- BodyStructureChecker error handling patterns narrowed from broad "error"
  substring to heading-aware patterns ("## error", "error handling",
  "## troubleshooting") to reduce false positives
- Updated display and loop golden strings for new spec-security and
  advisory 16-18 checkers
- All 40 test packages pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call)
- Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid
  redundant uppercasing on each loop iteration
- Fix boundary test name: 48 lines, not 50

Addresses wbreza review feedback on PR #79.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer pushed a commit that referenced this pull request Mar 10, 2026
- Update golden strings: 'no action verbs' → 'no common lead words'
- Clarify spec-security summary when FrontmatterRaw is nil
- Add TODO for plumbing SkillCount into production callers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 23:22
spboyer added a commit that referenced this pull request Mar 10, 2026
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing

- countPhrasesAfterPattern now stops at all trigger headers (USE FOR:,
  WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections
- Anti-trigger risk severity scales with context: High→error, Moderate→warning
- SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw
  is nil (programmatically constructed skills)
- ProgressiveDisclosureChecker trims trailing newline before line count to
  avoid off-by-one at the 500-line boundary
- BodyStructureChecker error handling patterns narrowed from broad "error"
  substring to heading-aware patterns ("## error", "error handling",
  "## troubleshooting") to reduce false positives
- Updated display and loop golden strings for new spec-security and
  advisory 16-18 checkers
- All 40 test packages pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call)
- Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid
  redundant uppercasing on each loop iteration
- Fix boundary test name: 48 lines, not 50

Addresses wbreza review feedback on PR #79.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer pushed a commit that referenced this pull request Mar 10, 2026
- Update golden strings: 'no action verbs' → 'no common lead words'
- Clarify spec-security summary when FrontmatterRaw is nil
- Add TODO for plumbing SkillCount into production callers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/sensei-parity branch from a218bdd to d84b276 Compare March 10, 2026 23:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment thread internal/scoring/scoring.go
Comment thread internal/checks/advisory_checks.go
Comment thread internal/checks/advisory_checks.go
spboyer and others added 6 commits March 11, 2026 10:54
…level, advisory checks 16-18, context-dependent anti-triggers

Implements seven features to align waza scoring with spboyer/sensei:

- #72: Add 'WHEN:' trigger pattern recognition (cross-model optimized)
- #73: Add spec-security checker (flags XML angle brackets and reserved prefixes)
- #74: Add Invalid adherence level for descriptions >1024 chars (hard-fail)
- #75: Add cross-model density checker (advisory 16: word count >60, action verb check)
- #76: Add body structure quality checker (advisory 17: examples, code blocks, error handling)
- #77: Add progressive disclosure checker (advisory 18: flags >500 line bodies, >50 line code blocks)
- #78: Add context-dependent anti-trigger risk (Low: 1-5 skills, Moderate: 6-14, High: 15+)

All changes include comprehensive test coverage.

Closes #72, closes #73, closes #74, closes #75, closes #76, closes #77, closes #78

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rsing, slice traversal, WHEN: count, body field, summary format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ity scaling, security checker nil guard, body line off-by-one, error pattern narrowing

- countPhrasesAfterPattern now stops at all trigger headers (USE FOR:,
  WHEN:, TRIGGERS:, NOT FOR:) to prevent double-counting across sections
- Anti-trigger risk severity scales with context: High→error, Moderate→warning
- SpecSecurityChecker checks reserved name prefixes even when FrontmatterRaw
  is nil (programmatically constructed skills)
- ProgressiveDisclosureChecker trims trailing newline before line count to
  avoid off-by-one at the 500-line boundary
- BodyStructureChecker error handling patterns narrowed from broad "error"
  substring to heading-aware patterns ("## error", "error handling",
  "## troubleshooting") to reduce false positives
- Updated display and loop golden strings for new spec-security and
  advisory 16-18 checkers
- All 40 test packages pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move numberedStepsPattern regex to package-level var (was recompiled per Check() call)
- Pre-compute strings.ToUpper(after) in countPhrasesAfterPattern to avoid
  redundant uppercasing on each loop iteration
- Fix boundary test name: 48 lines, not 50

Addresses wbreza review feedback on PR #79.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Update golden strings: 'no action verbs' → 'no common lead words'
- Clarify spec-security summary when FrontmatterRaw is nil
- Add TODO for plumbing SkillCount into production callers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/sensei-parity branch from d84b276 to 0036d56 Compare March 11, 2026 17:54

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both unresolved threads already addressed: (1) SkillCount — TODO added at scoring.go:116-117, plumbing is follow-up scope. (2) spec-security nil FrontmatterRaw — summary now says 'name prefix check only; raw frontmatter not available'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sensei-parity Parity with spboyer/sensei scoring

Projects

None yet

5 participants