feat: spec verify command (closes #361) by spboyer · Pull Request #385 · microsoft/waza

spboyer · 2026-06-28T11:17:28Z

Summary

Add waza spec verify with deterministic SKILL.md requirement extraction and eval task coverage reporting
Add opt-in semantic matching, CI fail/warn modes, and human/JSON/GitHub Actions output
Document the workflow in README, PRD, and site docs with CI examples

Closes #361

Validation

/opt/homebrew/bin/go test ./...
/opt/homebrew/bin/golangci-lint run
cd site && PATH=/opt/homebrew/bin:$PATH npm run build
/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format human
/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format json
/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format github-actions --fail

Copilot

Pull request overview

This PR adds a new waza spec verify CLI workflow that deterministically extracts requirements from SKILL.md, computes eval task coverage (with optional LLM-assisted semantic matching), and reports results in human/JSON/GitHub Actions formats. It extends the evaluation tooling in waza by making skill-contract drift visible and CI-gateable, aligning with issue #361’s “spec-to-test” verification goal.

Changes:

Add spec verify command and internal/specverify package for parsing SKILL.md requirements and mapping them to eval task coverage (deterministic + optional semantic).
Add tests covering parsing, deterministic coverage, CSV-backed tasks, and CLI behaviors.
Update README, PRD, and site docs to document the new command and CI usage.

Show a summary per file

File	Description
site/src/content/docs/reference/cli.mdx	Adds CLI reference docs for `waza spec verify` flags and examples.
site/src/content/docs/guides/spec-verify.mdx	New guide explaining spec verification, worked example, and CI snippet.
site/src/content/docs/guides/ci-cd.mdx	Adds a CI/CD section describing spec coverage checks with GitHub Actions annotations.
site/astro.config.mjs	Adds “Spec Verification” to the site navigation.
README.md	Documents `waza spec verify` usage and flags in the main README.
docs/PRD.md	Adds PRD entry for verifying eval coverage against SKILL.md requirements.
internal/specverify/types.go	Defines report/requirement/task types for spec verification output.
internal/specverify/parse.go	Implements deterministic SKILL.md parsing into requirement IDs + source spans.
internal/specverify/semantic.go	Adds optional semantic matcher using an execution engine as judge.
internal/specverify/coverage.go	Implements eval task loading and requirement-to-task coverage computation.
internal/specverify/parse_test.go	Tests deterministic extraction + spans and validates against existing corpus files when present.
internal/specverify/coverage_test.go	Tests deterministic coverage, semantic response parsing, and CSV dataset task loading.
cmd/waza/root.go	Wires the new `spec` command into the root CLI.
cmd/waza/cmd_spec.go	Implements `waza spec verify` command, flags, output rendering, and semantic engine wiring.
cmd/waza/cmd_spec_test.go	Adds CLI-level tests for presence, JSON output, and fail mode behavior.

Review details

Files reviewed: 15/15 changed files
Comments generated: 2
Review effort level: Low

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Review details

Files reviewed: 15/15 changed files
Comments generated: 4
Review effort level: Low

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 28, 2026 11:17

Copilot started reviewing on behalf of spboyer June 28, 2026 11:17 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

Comment thread internal/specverify/coverage.go Outdated

Comment thread cmd/waza/cmd_spec.go

Copilot AI added 2 commits June 28, 2026 07:34

feat: add spec verify command #361

ea53761

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address spec verify review feedback #361

d84d807

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the spboyer-spec-verify-command branch from 14543f2 to d84d807 Compare June 28, 2026 11:35

Copilot AI review requested due to automatic review settings June 28, 2026 11:35

Copilot started reviewing on behalf of spboyer June 28, 2026 11:35 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

Comment thread internal/specverify/coverage.go

Comment thread cmd/waza/cmd_spec.go

Comment thread cmd/waza/cmd_spec.go

Comment thread cmd/waza/cmd_spec.go Outdated

fix: address spec verify review feedback

a20d385

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer merged commit f0770af into main Jun 28, 2026
9 checks passed

spboyer deleted the spboyer-spec-verify-command branch June 28, 2026 11:48

This was referenced Jun 26, 2026

feat: Grader plugin extensibility (WASM/external programs) #18

Open

feat: Composable eval construction from registry graders #17

Open

feat: Go-module-style grader/eval references #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: spec verify command (closes #361)#385

feat: spec verify command (closes #361)#385
spboyer merged 3 commits into
mainfrom
spboyer-spec-verify-command

spboyer commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Jun 28, 2026

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Review details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants