feat: Add skill-aware thresholds to waza tokens compare by spboyer · Pull Request #93 · microsoft/waza

spboyer · 2026-03-05T01:58:14Z

Closes #81

This change folds the proposed skill-aware token budget delta workflow into the existing waza tokens compare command instead of introducing a separate tokens diff verb.

It adds:

--skills to compare discovered SKILL.md files under configured skill roots
--threshold to fail on excessive percent growth for existing skills
limit-aware failure semantics so over-budget results cannot appear successful

This addresses review feedback about command overlap while preserving the CI-oriented budget workflow.

codecov-commenter · 2026-03-05T02:01:56Z

Codecov Report

❌ Patch coverage is 83.87097% with 25 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@091a1f9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/tokens/compare.go	83.87%	15 Missing and 10 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #93   +/-   ##
=======================================
  Coverage        ?   72.31%           
=======================================
  Files           ?      129           
  Lines           ?    14430           
  Branches        ?        0           
=======================================
  Hits            ?    10435           
  Misses          ?     3222           
  Partials        ?      773

Flag	Coverage Δ
go-implementation	`72.31% <83.87%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new waza tokens diff CLI subcommand to compare per-skill SKILL.md token counts against a base git ref, with documentation and unit tests to support token budget delta review workflows.

Changes:

Introduces waza tokens diff with table/JSON output and a configurable percent-increase threshold.
Adds unit tests covering threshold behavior, skill root discovery (skills/ + .github/skills/ + configured skills dir), and config-based limits.
Updates README + CLI docs, and records squad decision/log notes for Issue #81.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`cmd/waza/tokens/diff.go`	Implements the new `tokens diff` command, output formatting, git-ref scanning, and limit loading.
`cmd/waza/tokens/diff_test.go`	Adds tests for threshold failure, JSON output, skills roots handling, and config limits.
`cmd/waza/tokens/root.go`	Wires the new `diff` subcommand into the `tokens` command tree.
`README.md`	Documents `waza tokens diff` usage and flags.
`site/src/content/docs/reference/cli.mdx`	Adds CLI reference docs for `waza tokens diff`.
`.squad/decisions.md`	Records the Issue #81 approach/decision (currently diverges from implemented CLI shape).
`.squad/log/2026-03-05T00-26-rusty-token-diff-design.md`	Session log capturing design discussion for the feature.
`.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md`	Session log for issue-assignment pipeline activation.

Comments suppressed due to low confidence (1)

cmd/waza/tokens/diff.go:248

countSkillTokensAtRef returns 0 for any read/git error (including permission issues or an invalid ref). This can produce misleading diffs and incorrect pass/fail decisions. It would be safer to treat only “file missing at ref” as 0, and otherwise return an error up the stack (similar to compare.go’s handling of os.ReadFile errors).

func countSkillTokensAtRef(rootDir, ref, relPath string, counter tokens.Counter) int {
	if relPath == "" {
		return 0
	}
	content := ""
	if ref == git.WorkingTreeRef {
		data, err := os.ReadFile(filepath.Join(rootDir, relPath))
		if err != nil {
			return 0
		}
		content = string(data)
	} else {
		c, err := git.GetFileFromRef(rootDir, relPath, ref)
		if err != nil {
			return 0
		}

chlowell · 2026-03-05T02:15:20Z

We already have tokens compare doing most of this. Could we use or replace that so we don't have two verbs doing almost the same thing?

spboyer

Verified by Rusty (Opus 4.6) — LGTM ✅

Precisely matches the Token Diff Distribution Strategy from decisions.md:

\waza tokens diff [base-ref]\ with --format\ (table|json) and --threshold\
Smart base ref fallback: origin/main → main
Reuses existing git helper package and \ okens.Counter\
Clean \diffReport\ JSON structure for CI integration
Token limit awareness with over-limit detection
Proper exit code behavior on threshold exceeded
Three comprehensive tests (threshold, JSON, base-ref fallback)
Docs updated: README, CLI reference
CI green on ubuntu + windows + lint

This is exactly the CLI-first architecture I specified — well done.

Note: Can't self-approve via API. Setting auto-merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

wbreza

Code Review: PR #93 - feat: Add waza tokens diff command

What Looks Good

Clean architecture - compareSkillDiffs, countSkillTokensAtRef, skillRootsForRef well-separated
Good error handling - validates format, threshold, git repo, ref existence
Graceful fallback from origin/main to main when remote missing
Missing files handled cleanly (returns 0 tokens)
Threshold-based pass/fail with clear exit code
Token limits integration for over-limit detection
README and site CLI docs updated

No Critical/High/Medium findings.

Overall Assessment: Approve

wbreza

Code Review: PR #93 - feat: Add waza tokens diff command

What Looks Good

Reads .waza.yaml from both base ref and working tree for config-aware comparison
Skill root discovery reads paths.skills at each ref
Clean fallback chain: origin/main to main with existence checks
Token limits integration shows over-limit status
Good test coverage: threshold, JSON format, ref fallback, unknown ref rejection

High (1 finding)

OverLimit does not affect Passed - Table shows Over limit! warning but exit code is driven only by ThresholdExceeded (percent delta). A skill 200%% over its absolute budget exits 0 if the change was small. Consider including OverLimit in the passed calculation or removing the warning.

Medium (1 finding)

Naming overlap with tokens compare - Both diff and compare live under waza tokens and both compare token counts across git refs. The distinction (diff is skill-aware + CI-gated, compare is general-purpose) is valid but not obvious to users. Consider whether these should be consolidated or if the naming should make the distinction clearer (e.g. budget-check, drift, or delta).

Low (1 finding)

percentageDiff returns 100%% for any new skill - New files (before=0) always report +100%% regardless of size, which trips any threshold < 100. May surprise users adding small new skills.

Summary

Priority	Count
Critical	0
High	1
Medium	1
Low	1

Overall Assessment: Approve - solid implementation with good config awareness. The OverLimit/Passed semantic gap and naming overlap with compare are worth addressing in follow-up.

chlowell

We already have tokens compare doing most of this. Do you really want a new verb?

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address PR microsoft#93 review feedback from Charles and Wallace: - Add --skills flag to filter comparison to SKILL.md files under skill roots - Add --threshold flag for percentage-change CI gating - Fix pass/fail semantics: both absolute limit violations (--strict) and threshold breaches now cause exit code 1 - Exempt newly added files from threshold checks (no baseline to compare) - Extract shared compareFilesets() engine for both modes - Add skill root discovery from .waza.yaml paths.skills config - Update README and site CLI reference docs No separate 'tokens diff' command needed — compare handles both general file-level and skill-aware comparison via flags. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-06T14:32:51Z

@wbreza @chlowell - updated, thanks for the feedback/catch

Updated per review feedback:

folded the new budget-delta workflow into waza tokens compare instead of adding a separate verb
addressed Wallace's semantics concern so budget overruns can no longer look successful
made threshold checks apply to existing skills without auto-failing newly added skills

The PR title/body now reflect the new command shape.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

cmd/waza/tokens/compare.go:167

In the strict || threshold > 0 block, comparisons[i].Exceeded is set based on absolute token limits unconditionally. This makes --threshold alone start failing builds when a file is over its absolute limit, even though the docs/tests imply absolute-limit enforcement should only happen with --strict. Consider separating concerns (e.g., always populate Limit, but only mark an absolute-limit failure when strict is true; track threshold breaches with a separate flag).

	// Load token limits when --strict or --threshold is set — both need limit awareness
	if strict || threshold > 0 {
		cfg := loadCompareLimitsConfig(rootDir)
		for i := range comparisons {
			if comparisons[i].After != nil {
				lr := checks.GetLimitForFile(comparisons[i].File, cfg)
				comparisons[i].Limit = lr.Limit
				comparisons[i].Exceeded = comparisons[i].After.Tokens > lr.Limit
			}

…ounting - Add OverLimit field to fileComparison for informational limit tracking - Absolute limit breaches only set Exceeded when --strict is set - Track LimitExceeded independently in summary (no subtraction) - Add test: --threshold without --strict ignores absolute limit breaches - Add test: both --strict and --threshold report categories independently - Table display uses OverLimit for informational warnings Addresses Copilot review feedback on PR microsoft#93. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

wbreza

Code Review: PR #93 — feat: Add skill-aware thresholds to waza tokens compare

Prior Review Items

✅ [High] Passed only driven by threshold — Now uses ExceededCount == 0 which includes both threshold breaches AND absolute limit breaches (with --strict). Test validates over-limit fails even when under threshold.
✅ [Medium] diff vs compare overlap — tokens diff removed entirely. Everything folded into tokens compare with --skills and --threshold flags.

✅ What Looks Good

Directly addresses chlowell's feedback — folds into tokens compare instead of new verb
Clean refactor — compareFilesets() extracted as shared engine
Config-aware comparison — reads .waza.yaml from both base ref (via git) and working tree
Smart base ref fallback — origin/main → main → HEAD
New file exemption — skip threshold, respect limits with --strict
Independent tracking — LimitExceeded and ThresholdBreached counted separately
8 new test scenarios — skills filtering, custom roots, threshold pass/fail, combined reporting

Findings Summary

Priority	Count
Critical	0
High	0
Medium	1
Low	0
Total	1

Medium: Exceeded field semantic change in JSON output (previously informational, now failure-only). Consider noting in changelog.

Overall Assessment: Approve — clean integration addressing reviewer feedback with comprehensive tests. Note: chlowell's CHANGES_REQUESTED is still active and should verify the approach.

- Passed flag now checks both ThresholdExceeded and OverLimit — a skill over its absolute budget causes failure regardless of percent-change threshold - Clarify Long description with fallback behavior - Add inline comment above base ref fallback - Generalize error message from 'threshold exceeded' to 'budget exceeded' Addresses wbreza [High] and chlowell review feedback on PR #93. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-09T22:07:06Z

Addressed wbreza [High] and chlowell review feedback in 4208dd6:

Bug fix: Passed flag now checks both ThresholdExceeded and OverLimit — a skill over its absolute budget causes diff failure regardless of percent-change threshold
Clarified Long description with explicit fallback behavior
Added inline comment above base ref fallback logic
Generalized error message from 'threshold exceeded' to 'budget exceeded'

spboyer requested a review from chlowell as a code owner March 5, 2026 01:58

Copilot AI review requested due to automatic review settings March 5, 2026 01:58

spboyer requested a review from richardpark-msft as a code owner March 5, 2026 01:58

spboyer self-assigned this Mar 5, 2026

github-actions Bot enabled auto-merge (squash) March 5, 2026 01:58

Copilot started reviewing on behalf of spboyer March 5, 2026 01:59 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread .squad/decisions.md Outdated

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/81-tokens-diff branch from c858749 to 46d9e38 Compare March 5, 2026 17:12

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #93

3655f87

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:40

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #93

ea1cc72

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/81-tokens-diff branch from 3655f87 to ea1cc72 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread site/src/content/docs/reference/cli.mdx Outdated

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address PR #93 token diff review findings

b980da0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

wbreza approved these changes Mar 5, 2026

View reviewed changes

wbreza previously approved these changes Mar 5, 2026

View reviewed changes

Comment thread cmd/waza/tokens/diff.go Outdated

Comment thread cmd/waza/tokens/root.go

chlowell requested changes Mar 6, 2026

View reviewed changes

spboyer added a commit that referenced this pull request Mar 6, 2026

fix: address review feedback on PR #93

f611d3c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit that referenced this pull request Mar 6, 2026

fix: address PR #93 token diff review findings

55cc014

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 6, 2026 14:32

spboyer dismissed wbreza’s stale review via 932cff4 March 6, 2026 14:32

spboyer force-pushed the squad/81-tokens-diff branch from b980da0 to 932cff4 Compare March 6, 2026 14:32

Copilot started reviewing on behalf of spboyer March 6, 2026 14:32 View session

spboyer changed the title ~~feat: Add waza tokens diff command~~ feat: Add skill-aware thresholds to waza tokens compare Mar 6, 2026

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Comment thread cmd/waza/tokens/compare.go Outdated

Comment thread cmd/waza/tokens/compare_test.go

Comment thread README.md

wbreza approved these changes Mar 6, 2026

View reviewed changes

Comment thread cmd/waza/tokens/compare.go

chlowell approved these changes Mar 6, 2026

View reviewed changes

Comment thread cmd/waza/tokens/compare.go

Comment thread cmd/waza/tokens/compare.go

github-actions Bot merged commit 54da7a5 into microsoft:main Mar 9, 2026
6 checks passed

spboyer mentioned this pull request Mar 12, 2026

Release v0.21.0 #122

Merged

4 tasks

Uh oh!

Conversation

spboyer commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chlowell commented Mar 5, 2026

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #93 - feat: Add waza tokens diff command

What Looks Good

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #93 - feat: Add waza tokens diff command

What Looks Good

High (1 finding)

Medium (1 finding)

Low (1 finding)

Summary

Uh oh!

Uh oh!

Uh oh!

chlowell left a comment

Choose a reason for hiding this comment

Uh oh!

spboyer commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #93 — feat: Add skill-aware thresholds to waza tokens compare

Prior Review Items

✅ What Looks Good

Findings Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spboyer commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

spboyer commented Mar 5, 2026 •

edited

Loading

codecov-commenter commented Mar 5, 2026 •

edited

Loading

spboyer commented Mar 6, 2026 •

edited

Loading