feat: regression gate command (closes #364) by spboyer · Pull Request #384 · microsoft/waza

spboyer · 2026-06-28T11:15:26Z

Closes #364.

Adds a new waza gate command for CI regression gates, plus a golden: true task field (absorbing #359) that gate enforces as a hard requirement.

What's new

`waza gate`

waza gate --baseline baseline.json --current results.json \
  [--max-regression-pct 5] \
  [--golden-must-pass] \
  [--on-new-tasks allow|warn|fail] \
  [--on-removed-tasks allow|warn|fail] \
  [--format human|json|markdown|github-actions]

Stable exit codes

Code	Meaning
`0`	Pass
`1`	Regression — success rate dropped beyond `--max-regression-pct`, or a task-set `fail` policy triggered
`2`	Golden failure — at least one task marked `golden: true` did not pass (takes precedence over regression)
`3`	Config error — bad flags, missing/unparseable files

Golden tasks

New golden: true field on tasks in eval YAML. It's propagated all the way through to results.json (TestOutcome.Golden) so waza gate can enforce it from results alone, without needing to re-read the eval YAML.

Conservative detection: a task is treated as golden if either the baseline or the current run marks it golden. This avoids regressions slipping through when an older baseline predates the field.

Output formats

human (default) — colored summary with regression table, golden status, task-set deltas
json — machine-readable GateReport
markdown — PR-comment-friendly report
github-actions — emits ::error:: / ::warning:: / ::notice:: annotations on stdout and appends a markdown summary to $GITHUB_STEP_SUMMARY when set

Tests

cmd/waza/cmd_gate_test.go covers all acceptance criteria:

Pass when no regression
Regression exceeds threshold
Golden failure takes precedence over regression
--golden-must-pass=false allows golden to fail without exit 2
Golden detected from baseline even when missing in current
New/removed task policies (allow/warn/fail combinations)
All four output formats render correctly
Config errors return exit 3
golden YAML roundtrip

Docs

site/src/content/docs/reference/cli.mdx — full ## waza gate section
site/src/content/docs/guides/ci-cd.mdx — GitHub Actions + Azure DevOps snippets
site/src/content/docs/guides/eval-yaml.mdx — golden field in task fields table

Design notes (simple choices for ambiguous spec items)

Golden detection: union across baseline/current (described above) — safer for older baselines.
Default policy: --on-new-tasks=allow (additive growth is good), --on-removed-tasks=warn (visibility without breaking PRs that intentionally prune tasks). Both can be overridden to fail for stricter CI.
--max-regression-pct=0 default: no regression tolerated unless explicitly allowed.
Exit-code plumbing: introduced ExitCodeError in cmd/waza/main.go so subcommands can request specific exit codes without leaking that concern across the rest of the CLI.

Files

New: cmd/waza/cmd_gate.go, cmd/waza/cmd_gate_test.go
Modified: cmd/waza/main.go (ExitCodeError), cmd/waza/root.go (register command), internal/models/{testcase,outcome}.go (Golden field), internal/orchestration/runner.go (propagate Golden through 3 emit paths), site docs.

Add 'waza gate' for CI regression gates: compares baseline vs current results.json with configurable thresholds and stable exit codes. - New 'golden' field on TestCase (YAML) and TestOutcome (JSON), propagated through runner so gate can read it without re-reading the eval YAML. - Stable exit codes: 0 pass, 1 regression, 2 golden failure, 3 config error. Golden failure takes precedence over plain regression. - Configurable policies for new/removed tasks (allow/warn/fail). - Output formats: human, json, markdown, github-actions (annotations + $GITHUB_STEP_SUMMARY). - Conservative golden detection: treat task as golden if either side marks it, so older baselines without the field don't bypass enforcement. - Tests cover all acceptance criteria (regression threshold, golden hard-fail, task-set policies, all four output formats, config errors). - Docs: new CLI reference section, CI/CD guide snippets for GitHub Actions and Azure DevOps, golden field documented in eval YAML guide. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new CI-focused regression gate to waza, introducing a waza gate command that compares results.json files (baseline vs current), enforces regression and “golden task must pass” policies, and emits stable exit codes + CI-friendly output formats. This fits into the CLI’s existing results tooling (alongside waza compare) and extends the results schema to carry golden metadata end-to-end.

Changes:

Introduces waza gate (human/json/markdown/github-actions output, stable exit codes, task-set delta policies).
Adds golden: true support on eval tasks and propagates it into results.json (TestCase.Golden → TestOutcome.Golden).
Updates site docs to document waza gate and the new golden task field.

Show a summary per file

File	Description
`cmd/waza/cmd_gate.go`	Implements the new `waza gate` command, report model, gating logic, and renderers.
`cmd/waza/cmd_gate_test.go`	Adds acceptance-criteria tests for gating behavior, exit codes, and output formats.
`cmd/waza/main.go`	Adds `ExitCodeError` plumbing to support stable subcommand-selected exit codes.
`cmd/waza/root.go`	Registers the new `gate` subcommand.
`internal/models/testcase.go`	Adds `TestCase.Golden` YAML/JSON field to mark golden tasks.
`internal/models/outcome.go`	Adds `TestOutcome.Golden` JSON field to persist golden status into `results.json`.
`internal/orchestration/runner.go`	Propagates `Golden` into emitted `TestOutcome`s across execution paths.
`site/src/content/docs/reference/cli.mdx`	Documents `waza gate` flags, exit codes, and examples.
`site/src/content/docs/guides/eval-yaml.mdx`	Documents the new `golden` task field.
`site/src/content/docs/guides/ci-cd.mdx`	Adds CI wiring examples for `waza gate` (GitHub Actions + Azure DevOps).

Review details

Files reviewed: 10/10 changed files
Comments generated: 6
Review effort level: Low

- Default --max-regression-pct now 0 (was 5.0); explicit threshold required to tolerate any drop in success rate - Help examples updated: separate examples for default (zero tolerance) and for tolerating a 5pp drop - Set SilenceUsage/SilenceErrors on gate cobra cmd; ExitCodeError now carries a meaningful message (e.g. 'waza gate: regression (exit 1)') - GitHub Actions formatter demotes golden annotations to ::warning:: with title 'Golden task failed (non-blocking)' when --golden-must-pass=false; preserves ::error:: only when goldens are required - Tests: defaultOpts maxRegressionPct=0; TestGate_RegressionWithinThresholdPasses now sets the threshold explicitly; new TestGate_DefaultZeroThresholdFailsAnyRegression and TestGate_FormatGitHubActionsDemotesGoldenWhenPolicyRelaxed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 28, 2026 11:15

Copilot started reviewing on behalf of spboyer June 28, 2026 11:15 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

Comment thread cmd/waza/cmd_gate.go Outdated

Comment thread cmd/waza/cmd_gate.go Outdated

Comment thread cmd/waza/cmd_gate.go

Comment thread cmd/waza/cmd_gate.go

Comment thread cmd/waza/cmd_gate_test.go Outdated

Comment thread cmd/waza/cmd_gate_test.go

spboyer mentioned this pull request Jun 28, 2026

feat: Add waza gate command for CI regression gates (#364) #377

Closed

spboyer merged commit 169ade0 into main Jun 28, 2026
10 checks passed

spboyer deleted the spboyer-issue-364-gate branch June 28, 2026 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: regression gate command (closes #364)#384

feat: regression gate command (closes #364)#384
spboyer merged 2 commits into
mainfrom
spboyer-issue-364-gate

spboyer commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Jun 28, 2026

What's new

waza gate

Stable exit codes

Golden tasks

Output formats

Tests

Docs

Design notes (simple choices for ambiguous spec items)

Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`waza gate`