feat: regression gate command (closes #364)#384
Merged
Conversation
Add 'waza gate' for CI regression gates: compares baseline vs current results.json with configurable thresholds and stable exit codes. - New 'golden' field on TestCase (YAML) and TestOutcome (JSON), propagated through runner so gate can read it without re-reading the eval YAML. - Stable exit codes: 0 pass, 1 regression, 2 golden failure, 3 config error. Golden failure takes precedence over plain regression. - Configurable policies for new/removed tasks (allow/warn/fail). - Output formats: human, json, markdown, github-actions (annotations + $GITHUB_STEP_SUMMARY). - Conservative golden detection: treat task as golden if either side marks it, so older baselines without the field don't bypass enforcement. - Tests cover all acceptance criteria (regression threshold, golden hard-fail, task-set policies, all four output formats, config errors). - Docs: new CLI reference section, CI/CD guide snippets for GitHub Actions and Azure DevOps, golden field documented in eval YAML guide. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new CI-focused regression gate to waza, introducing a waza gate command that compares results.json files (baseline vs current), enforces regression and “golden task must pass” policies, and emits stable exit codes + CI-friendly output formats. This fits into the CLI’s existing results tooling (alongside waza compare) and extends the results schema to carry golden metadata end-to-end.
Changes:
- Introduces
waza gate(human/json/markdown/github-actions output, stable exit codes, task-set delta policies). - Adds
golden: truesupport on eval tasks and propagates it intoresults.json(TestCase.Golden→TestOutcome.Golden). - Updates site docs to document
waza gateand the newgoldentask field.
Show a summary per file
| File | Description |
|---|---|
cmd/waza/cmd_gate.go |
Implements the new waza gate command, report model, gating logic, and renderers. |
cmd/waza/cmd_gate_test.go |
Adds acceptance-criteria tests for gating behavior, exit codes, and output formats. |
cmd/waza/main.go |
Adds ExitCodeError plumbing to support stable subcommand-selected exit codes. |
cmd/waza/root.go |
Registers the new gate subcommand. |
internal/models/testcase.go |
Adds TestCase.Golden YAML/JSON field to mark golden tasks. |
internal/models/outcome.go |
Adds TestOutcome.Golden JSON field to persist golden status into results.json. |
internal/orchestration/runner.go |
Propagates Golden into emitted TestOutcomes across execution paths. |
site/src/content/docs/reference/cli.mdx |
Documents waza gate flags, exit codes, and examples. |
site/src/content/docs/guides/eval-yaml.mdx |
Documents the new golden task field. |
site/src/content/docs/guides/ci-cd.mdx |
Adds CI wiring examples for waza gate (GitHub Actions + Azure DevOps). |
Review details
- Files reviewed: 10/10 changed files
- Comments generated: 6
- Review effort level: Low
- Default --max-regression-pct now 0 (was 5.0); explicit threshold required to tolerate any drop in success rate - Help examples updated: separate examples for default (zero tolerance) and for tolerating a 5pp drop - Set SilenceUsage/SilenceErrors on gate cobra cmd; ExitCodeError now carries a meaningful message (e.g. 'waza gate: regression (exit 1)') - GitHub Actions formatter demotes golden annotations to ::warning:: with title 'Golden task failed (non-blocking)' when --golden-must-pass=false; preserves ::error:: only when goldens are required - Tests: defaultOpts maxRegressionPct=0; TestGate_RegressionWithinThresholdPasses now sets the threshold explicitly; new TestGate_DefaultZeroThresholdFailsAnyRegression and TestGate_FormatGitHubActionsDemotesGoldenWhenPolicyRelaxed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #364.
Adds a new
waza gatecommand for CI regression gates, plus agolden: truetask field (absorbing #359) that gate enforces as a hard requirement.What's new
waza gateStable exit codes
01--max-regression-pct, or a task-setfailpolicy triggered2golden: truedid not pass (takes precedence over regression)3Golden tasks
New
golden: truefield on tasks in eval YAML. It's propagated all the way through toresults.json(TestOutcome.Golden) sowaza gatecan enforce it from results alone, without needing to re-read the eval YAML.Conservative detection: a task is treated as golden if either the baseline or the current run marks it golden. This avoids regressions slipping through when an older baseline predates the field.
Output formats
human(default) — colored summary with regression table, golden status, task-set deltasjson— machine-readableGateReportmarkdown— PR-comment-friendly reportgithub-actions— emits::error::/::warning::/::notice::annotations on stdout and appends a markdown summary to$GITHUB_STEP_SUMMARYwhen setTests
cmd/waza/cmd_gate_test.gocovers all acceptance criteria:--golden-must-pass=falseallows golden to fail without exit 2goldenYAML roundtripDocs
site/src/content/docs/reference/cli.mdx— full## waza gatesectionsite/src/content/docs/guides/ci-cd.mdx— GitHub Actions + Azure DevOps snippetssite/src/content/docs/guides/eval-yaml.mdx—goldenfield in task fields tableDesign notes (simple choices for ambiguous spec items)
--on-new-tasks=allow(additive growth is good),--on-removed-tasks=warn(visibility without breaking PRs that intentionally prune tasks). Both can be overridden tofailfor stricter CI.--max-regression-pct=0default: no regression tolerated unless explicitly allowed.ExitCodeErrorincmd/waza/main.goso subcommands can request specific exit codes without leaking that concern across the rest of the CLI.Files
cmd/waza/cmd_gate.go,cmd/waza/cmd_gate_test.gocmd/waza/main.go(ExitCodeError),cmd/waza/root.go(register command),internal/models/{testcase,outcome}.go(Golden field),internal/orchestration/runner.go(propagate Golden through 3 emit paths), site docs.