feat: add `waza quality` command — LLM-as-Judge skill quality scoring by spboyer · Pull Request #218 · microsoft/waza

spboyer · 2026-04-22T18:14:34Z

Summary

Closes #98

Adds waza quality <skill-path> — an LLM-as-Judge command that evaluates skill content quality across five dimensions, each scored 1–5:

Dimension	What it measures
clarity	Instruction clarity, structure, step ordering
completeness	Edge case coverage, detail level
trigger_precision	USE FOR / DO NOT USE FOR quality
scope_coverage	Boundary definition, capability explicitness
anti_patterns	Avoidance of vague/conflicting instructions

New files

internal/quality/rubric.go — Dimension definitions, validation
internal/quality/judge.go — Prompt construction, copilot SDK execution, JSON response parsing
internal/quality/report.go — Table (with visual score bars) and JSON formatters
cmd/waza/cmd_quality.go — CLI command registration
cmd/waza/cmd_quality_test.go — 8 command-level tests
internal/quality/*_test.go — 20 unit tests

CLI usage

waza quality skills/my-skill                    # table output
waza quality skills/my-skill --format json      # JSON for CI
waza quality skills/my-skill --model gpt-4o     # specific judge model

Flags

Flag	Default	Description
`--model`	project default	Model to use as judge
`--format`	`table`	Output: `table` or `json`
`--rubric`	—	Custom rubric file (reserved, errors for now)

Design decisions

Judge prompt requests structured JSON: {dimensions: [{name, score, feedback}], overall_score, summary}
Copilot SDK mocked in all tests — no real LLM calls
Auth failures produce clear copilot login message (same pattern as waza models)
Partial responses with validation issues still display with a warning

Tests

go test ./... — all pass
go vet ./... — clean
Site builds: cd site && npm run build — ✅

Docs updated

README.md — new command section
site/src/content/docs/reference/cli.mdx — full CLI reference entry

…98 Add `waza quality <skill-path>` command that uses an LLM to evaluate skill content quality across five dimensions: - clarity: instruction clarity and structure - completeness: coverage of edge cases and detail level - trigger_precision: USE FOR / DO NOT USE FOR definition quality - scope_coverage: boundary clarity and capability explicitness - anti_patterns: avoidance of vague/conflicting instructions Implementation: - internal/quality/rubric.go: dimension definitions and validation - internal/quality/judge.go: prompt construction, LLM execution, response parsing - internal/quality/report.go: table and JSON output formatting - cmd/waza/cmd_quality.go: CLI command with --model, --format, --rubric flags - 28 tests covering rubric validation, judge execution, response parsing, report formatting, auth errors, and edge cases - Documentation in README.md and site CLI reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Apply gofmt formatting to scope_reduction.go (struct alignment, blank line) - Apply gofmt formatting to scope_reduction_test.go (struct alignment) - Fix errcheck: use comma-ok form for all type assertions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot enabled auto-merge (squash) April 22, 2026 18:14

Copilot AI and others added 2 commits April 22, 2026 14:16

fix: gofmt + staticcheck lint violations in quality package

78fa230

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer merged commit c473b41 into main Apr 22, 2026
6 checks passed

spboyer deleted the squad/98-waza-quality branch April 22, 2026 19:50

spboyer mentioned this pull request Feb 28, 2026

🎯 Waza Platform Roadmap - Tracking Issue #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `waza quality` command — LLM-as-Judge skill quality scoring#218

feat: add `waza quality` command — LLM-as-Judge skill quality scoring#218
spboyer merged 3 commits into
mainfrom
squad/98-waza-quality

spboyer commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

spboyer commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New files

CLI usage

Flags

Design decisions

Tests

Docs updated

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spboyer commented Apr 22, 2026 •

edited

Loading