feat: Add eval scaffolding command (waza eval new) by spboyer · Pull Request #94 · microsoft/waza

spboyer · 2026-03-05T02:23:40Z

Closes #83

Working as Linus (Backend Developer)
⚠️ This task was flagged as "needs review" — please have a squad member review before merging.

chlowell · 2026-03-05T02:26:17Z

Should this go in init or new instead of a new verb?

codecov-commenter · 2026-03-05T02:27:54Z

Codecov Report

❌ Patch coverage is 87.84530% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@bac0893). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/cmd_eval.go	87.77%	14 Missing and 8 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #94   +/-   ##
=======================================
  Coverage        ?   72.82%           
=======================================
  Files           ?      130           
  Lines           ?    14816           
  Branches        ?        0           
=======================================
  Hits            ?    10790           
  Misses          ?     3221           
  Partials        ?      805

Flag	Coverage Δ
go-implementation	`72.82% <87.84%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new waza eval new <skill-name> CLI subcommand to scaffold an evaluation suite from a skill’s SKILL.md frontmatter, plus docs/tests to make it discoverable and verifiable.

Changes:

Introduce waza eval new command that parses SKILL.md triggers and generates eval.yaml + starter trigger/anti-trigger task YAMLs.
Add unit tests validating scaffold generation, custom output path behavior, and missing SKILL.md error handling.
Update CLI docs + README + command metadata expectations to include the new eval top-level command.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
site/src/content/docs/reference/cli.mdx	Documents the new `waza eval new` command in the CLI reference.
cmd/waza/root.go	Registers the new `eval` command in the root CLI.
cmd/waza/cmd_metadata_test.go	Updates metadata test to expect the new top-level `eval` command.
cmd/waza/cmd_eval_test.go	Adds tests covering scaffold generation, `--output`, and error cases.
cmd/waza/cmd_eval.go	Implements `waza eval new` scaffold generation logic.
README.md	Adds usage docs for `waza eval new`.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md	Adds squad session log (non-functional change).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md	Adds squad session log (non-functional change).
.squad/decisions.md	Records squad decisions (non-functional change).

Comments suppressed due to low confidence (2)

cmd/waza/cmd_eval.go:74

When --output is not provided, the default output path is hard-coded to evals/<skill-name>/eval.yaml. If the project uses a custom paths.evals in .waza.yaml, this will write scaffolding into the wrong directory. It would be more consistent with other commands to derive the default evals directory from project config/workspace detection.

	if outputPath == "" {
		outputPath = filepath.Join("evals", skillName, "eval.yaml")
	}
	tasksDir := filepath.Join(filepath.Dir(outputPath), "tasks")

site/src/content/docs/reference/cli.mdx:202

Flag docs list --output without indicating it takes a path argument. For consistency with the README and the actual flag help, consider documenting it as --output <path>.

| Flag | Description |
|------|-------------|
| `--output` | Custom path for generated `eval.yaml` |

spboyer

LGTM — Rusty. waza eval new is a clean scaffolding command. Good use of SKILL.md frontmatter parsing for positive/negative trigger generation. extractKeywords with stop words is smart. Tests validate generated YAML through validation.ValidateEvalBytes/ValidateTaskBytes — nice. README + CLI reference updated. Ship it. (Self-authored PR — cannot self-approve via API.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

wbreza

Code Review: PR #94 - feat: Add eval scaffolding command (waza eval new)

What Looks Good

Smart SKILL.md resolution: workspace-aware detection with .waza.yaml config, fallback to conventional paths
Safe overwrite protection and partial write cleanup
Generated YAML validated in tests via ValidateEvalBytes/ValidateTaskBytes
Custom output path works correctly
Docs fully updated

Discussion: Command Overlap

Before approving, I want to flag a naming/scope discussion:

waza new already creates a skill + eval scaffold together
waza eval new (this PR) creates eval scaffold from an existing skill
waza init detects skills missing evals and can create them following config

Questions worth resolving:

Should this be waza new eval (verb-noun pattern) instead of waza eval new? The verb-noun pattern (waza new [skill|eval]) is more discoverable and consistent.
Does waza init's gap-filling already cover the use case of adding evals to existing skills? If so, is a separate command needed or would init --skill=name suffice?
If we keep eval as a subcommand group, what other subcommands go under it? If it's just new, the group may be premature.

Findings

Medium:

Hardcoded 2 positive + 1 negative task structure may not suit skills with rich USE FOR sections (5+ phrases) or minimal descriptions.

Low:
2. Site docs use [skill-name] (optional) instead of (required).
3. extractKeywords stop words overlap with trigger_grader.go stop words.

Summary

Priority	Count
Critical	0
High	0
Medium	1
Low	2

Overall Assessment: Comment - implementation is solid, but the command naming and overlap with waza new and waza init should be discussed before merging.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

cmd/waza/cmd_eval.go:74

There’s test coverage for custom paths.skills, but no test verifying that the default output location respects paths.evals from .waza.yaml (or that output is anchored to the detected workspace root when invoked from inside a skill directory). Adding a regression test around the default outputPath behavior would help prevent generating evals in the wrong directory structure.

	if outputPath == "" {
		outputPath = filepath.Join("evals", skillName, "eval.yaml")
	}

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-10T16:09:20Z

Should this go in init or new instead of a new verb?

@chlowell - good point. waza new eval ?

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chlowell · 2026-03-10T16:29:41Z

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

spboyer · 2026-03-10T16:37:15Z

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

We should consider merging init and new

cc: @richardpark-msft

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@chlowell

…back Addresses @chlowell's feedback to use 'waza new eval' instead of a separate 'eval' top-level verb. The eval subcommand is now registered under the existing 'new' command: waza new [skill-name] → create full skill + eval (unchanged) waza new eval <skill-name> → scaffold eval-only for existing skill - Remove top-level 'eval' command wrapper - Register newEvalNewCommand() under newNewCommand() - Update all tests, README, and CLI docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-10T16:47:26Z

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

I made this change for now

spboyer requested review from chlowell and richardpark-msft as code owners March 5, 2026 02:23

Copilot AI review requested due to automatic review settings March 5, 2026 02:23

spboyer self-assigned this Mar 5, 2026

github-actions Bot enabled auto-merge (squash) March 5, 2026 02:24

Copilot started reviewing on behalf of spboyer March 5, 2026 02:24 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread cmd/waza/cmd_eval.go

Comment thread site/src/content/docs/reference/cli.mdx

Comment thread cmd/waza/cmd_eval_test.go Outdated

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/83-eval-new branch from 79893e4 to 02fb261 Compare March 5, 2026 17:12

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #94

f016c5f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:38

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #94

c00d9c3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/83-eval-new branch from f016c5f to c00d9c3 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread site/src/content/docs/reference/cli.mdx Outdated

Comment thread README.md

Comment thread cmd/waza/cmd_eval.go

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address PR #94 eval scaffold review comments

c28fc26

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 6, 2026

fix: address review feedback on PR microsoft#94

a30cc05

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 6, 2026

fix: address PR microsoft#94 eval scaffold review comments

7b73e2d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/83-eval-new branch from c28fc26 to 7b73e2d Compare March 6, 2026 00:04

Copilot AI review requested due to automatic review settings March 6, 2026 00:04

Copilot started reviewing on behalf of spboyer March 6, 2026 00:06 View session

wbreza reviewed Mar 6, 2026

View reviewed changes

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Comment thread cmd/waza/cmd_eval.go

Comment thread cmd/waza/cmd_eval.go

spboyer added a commit that referenced this pull request Mar 10, 2026

fix: address review feedback on PR #94

39c52da

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit that referenced this pull request Mar 10, 2026

fix: address PR #94 eval scaffold review comments

d3e02c9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 10, 2026

fix: address review feedback on PR microsoft#94

04bb203

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 10, 2026

fix: address PR microsoft#94 eval scaffold review comments

ea4edd6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/83-eval-new branch from 7b73e2d to ea4edd6 Compare March 10, 2026 16:15

Copilot AI review requested due to automatic review settings March 10, 2026 16:26

Copilot started reviewing on behalf of spboyer March 10, 2026 16:29 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread cmd/waza/cmd_metadata_test.go Outdated

Comment thread cmd/waza/cmd_new.go

spboyer and others added 5 commits March 10, 2026 12:40

feat: add eval scaffolding command microsoft#83

040e464

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address review feedback on PR microsoft#94

8149935

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address PR microsoft#94 eval scaffold review comments

491b981

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: remove 'eval' from metadata expected commands after refactor

37f86f1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/83-eval-new branch from 5d722be to 37f86f1 Compare March 10, 2026 16:43

chlowell approved these changes Mar 10, 2026

View reviewed changes

github-actions Bot merged commit f3371ce into microsoft:main Mar 10, 2026
6 checks passed

spboyer mentioned this pull request Mar 12, 2026

Release v0.21.0 #122

Merged

4 tasks

Uh oh!

Conversation

spboyer commented Mar 5, 2026

Uh oh!

chlowell commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #94 - feat: Add eval scaffolding command (waza eval new)

What Looks Good

Discussion: Command Overlap

Findings

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

spboyer commented Mar 10, 2026

Uh oh!

chlowell commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

spboyer commented Mar 10, 2026

Uh oh!

spboyer commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Mar 5, 2026 •

edited

Loading