feat: spec verify command (closes #361)#385
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new waza spec verify CLI workflow that deterministically extracts requirements from SKILL.md, computes eval task coverage (with optional LLM-assisted semantic matching), and reports results in human/JSON/GitHub Actions formats. It extends the evaluation tooling in waza by making skill-contract drift visible and CI-gateable, aligning with issue #361’s “spec-to-test” verification goal.
Changes:
- Add
spec verifycommand andinternal/specverifypackage for parsing SKILL.md requirements and mapping them to eval task coverage (deterministic + optional semantic). - Add tests covering parsing, deterministic coverage, CSV-backed tasks, and CLI behaviors.
- Update README, PRD, and site docs to document the new command and CI usage.
Show a summary per file
| File | Description |
|---|---|
| site/src/content/docs/reference/cli.mdx | Adds CLI reference docs for waza spec verify flags and examples. |
| site/src/content/docs/guides/spec-verify.mdx | New guide explaining spec verification, worked example, and CI snippet. |
| site/src/content/docs/guides/ci-cd.mdx | Adds a CI/CD section describing spec coverage checks with GitHub Actions annotations. |
| site/astro.config.mjs | Adds “Spec Verification” to the site navigation. |
| README.md | Documents waza spec verify usage and flags in the main README. |
| docs/PRD.md | Adds PRD entry for verifying eval coverage against SKILL.md requirements. |
| internal/specverify/types.go | Defines report/requirement/task types for spec verification output. |
| internal/specverify/parse.go | Implements deterministic SKILL.md parsing into requirement IDs + source spans. |
| internal/specverify/semantic.go | Adds optional semantic matcher using an execution engine as judge. |
| internal/specverify/coverage.go | Implements eval task loading and requirement-to-task coverage computation. |
| internal/specverify/parse_test.go | Tests deterministic extraction + spans and validates against existing corpus files when present. |
| internal/specverify/coverage_test.go | Tests deterministic coverage, semantic response parsing, and CSV dataset task loading. |
| cmd/waza/root.go | Wires the new spec command into the root CLI. |
| cmd/waza/cmd_spec.go | Implements waza spec verify command, flags, output rendering, and semantic engine wiring. |
| cmd/waza/cmd_spec_test.go | Adds CLI-level tests for presence, JSON output, and fail mode behavior. |
Review details
- Files reviewed: 15/15 changed files
- Comments generated: 2
- Review effort level: Low
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
14543f2 to
d84d807
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
waza spec verifywith deterministic SKILL.md requirement extraction and eval task coverage reportingCloses #361
Validation
/opt/homebrew/bin/go test ./.../opt/homebrew/bin/golangci-lint runcd site && PATH=/opt/homebrew/bin:$PATH npm run build/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format human/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format json/opt/homebrew/bin/go run ./cmd/waza spec verify examples/code-explainer/SKILL.md examples/code-explainer/eval.yaml --format github-actions --fail