fix: macOS install + trigger test off-by-1 count (#164, #184) by spboyer · Pull Request #193 · microsoft/waza

spboyer · 2026-04-21T14:48:49Z

Fixes

#164 — install.sh fails on macOS

The script used sha256sum which doesn't exist on macOS. Now prioritizes shasum -a 256 (native macOS), falls back to sha256sum, and errors if neither is available.

#184 — Trigger test result values off by 1

ComputeTriggerMetrics was using confidence-weighted sums for integer TP/FP/TN/FN counts. Medium-confidence prompts (weight 0.5) caused counts to be fractional — e.g., 6 high + 2 medium = 7.0 instead of 8. Fix separates integer counts (always whole) from weighted precision/recall/F1.

Closes #164
Closes #184

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The install script was failing on macOS because it prioritized sha256sum over shasum. While sha256sum exists on some macOS systems (via Homebrew), the BSD version doesn't support the -c flag needed for checksum verification. This fix: - Prioritizes shasum (native on macOS, supports -c flag) - Falls back to sha256sum only if it supports the -c flag - Exits with an error if no compatible utility is found (rather than skipping verification with a warning) Fixes #164 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ComputeTriggerMetrics used weighted sums (confidence-adjusted) for the integer TP/FP/TN/FN counts. Medium-confidence prompts contributed 0.5 instead of 1.0, so groups with 6 high + 2 medium prompts reported 7 instead of 8. Fix: track actual result counts for the integer fields while keeping weighted values for precision/recall/F1/accuracy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-04-21T14:49:31Z

Testing completed ✅

Verified on macOS Darwin/ARM64:

shasum -a 256 -c works correctly for checksum verification
Homebrew sha256sum exists but doesn't support -c flag (BSD version)
Fix correctly prioritizes shasum first
Script properly fails on checksum mismatch
Script exits with clear error when no compatible utility found

Ready for review and merge.

Copilot

Pull request overview

This PR fixes macOS binary installation failures by improving checksum verification in install.sh, and also updates trigger-metrics computation/tests to separate raw confusion-matrix counts from confidence-weighted scoring.

Changes:

Update install.sh to prefer shasum on macOS and only use sha256sum when it supports -c.
Adjust trigger-metrics computation to track integer (raw) TP/FP/TN/FN counts separately from confidence-weighted precision/recall/F1/accuracy.
Add a regression test to ensure medium confidence weighting does not reduce integer counts.

Show a summary per file

File	Description
install.sh	Fix checksum verification tool selection for macOS/Linux and fail fast when no compatible tool exists.
internal/models/trigger_metrics.go	Separate raw counts from weighted scoring inputs when computing trigger metrics.
internal/models/trigger_metrics_test.go	Update/add tests validating the new raw-count vs weighted-score behavior.

Copilot's findings

Files reviewed: 3/3 changed files
Comments generated: 2

Copilot · 2026-04-21T14:53:38Z

+	// Track actual counts for display and weighted values for scoring.
+	var tpCount, fpCount, tnCount, fnCount int
+	var tpW, fpW, tnW, fnW float64


TriggerMetrics now appears to return unweighted integer TP/FP/TN/FN counts while precision/recall/F1/accuracy are computed from confidence-weighted values. This can make the JSON/CLI output internally inconsistent (e.g., accuracy won’t equal (TP+TN)/(TP+FP+TN+FN)), and consumers may misinterpret the counts. Please update the exported docs (struct or function comment) to explicitly state the different semantics, or consider exposing weighted counts separately if both are needed.

Copilot · 2026-04-21T14:53:38Z

+		{
+			name: "medium confidence does not reduce integer counts (issue #184)",
+			results: func() []TriggerResult {
+				// 8 should-trigger (6 high + 2 medium), 8 should-not-trigger (6 high + 2 medium)
+				// None triggered → all should-trigger are FN, all should-not-trigger are TN.
+				var r []TriggerResult
+				for i := 0; i < 6; i++ {


The PR title/description are scoped to fixing install.sh checksum verification on macOS (issue #164), but this change set also adds a new trigger-metrics test referencing a different issue (#184). Please either update the PR description/title to include this additional behavior change or split the trigger-metrics work into a separate PR so the intent and review scope are clear.

Copilot AI review requested due to automatic review settings April 21, 2026 14:48

spboyer requested review from chlowell, richardpark-msft and wbreza as code owners April 21, 2026 14:48

github-actions Bot enabled auto-merge (squash) April 21, 2026 14:49

Copilot started reviewing on behalf of spboyer April 21, 2026 14:49 View session

spboyer changed the title ~~fix: install.sh uses shasum on macOS when sha256sum unavailable (#164)~~ fix: macOS install + trigger test off-by-1 count (#164, #184) Apr 21, 2026

spboyer merged commit 9ff4bb3 into main Apr 21, 2026
6 checks passed

spboyer deleted the squad/184-trigger-off-by-one branch April 21, 2026 14:53

Copilot AI reviewed Apr 21, 2026

View reviewed changes

This was referenced Apr 21, 2026

Trigger test result values off by 1 #184

Closed

Binary installations fails on macOS #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: macOS install + trigger test off-by-1 count (#164, #184)#193

fix: macOS install + trigger test off-by-1 count (#164, #184)#193
spboyer merged 2 commits into
mainfrom
squad/184-trigger-off-by-one

spboyer commented Apr 21, 2026 •

edited

Loading

Uh oh!

spboyer commented Apr 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes

#164 — install.sh fails on macOS

#184 — Trigger test result values off by 1

Uh oh!

spboyer commented Apr 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spboyer commented Apr 21, 2026 •

edited

Loading