Skip to content

feat: Add eval scaffolding command (waza eval new)#94

Merged
github-actions[bot] merged 5 commits into
microsoft:mainfrom
spboyer:squad/83-eval-new
Mar 10, 2026
Merged

feat: Add eval scaffolding command (waza eval new)#94
github-actions[bot] merged 5 commits into
microsoft:mainfrom
spboyer:squad/83-eval-new

Conversation

@spboyer

@spboyer spboyer commented Mar 5, 2026

Copy link
Copy Markdown
Member

Closes #83

Working as Linus (Backend Developer)
⚠️ This task was flagged as "needs review" — please have a squad member review before merging.

Copilot AI review requested due to automatic review settings March 5, 2026 02:23
@spboyer spboyer self-assigned this Mar 5, 2026
@github-actions github-actions Bot enabled auto-merge (squash) March 5, 2026 02:24
@chlowell

chlowell commented Mar 5, 2026

Copy link
Copy Markdown
Member

Should this go in init or new instead of a new verb?

@codecov-commenter

codecov-commenter commented Mar 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.84530% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@bac0893). Learn more about missing BASE report.

Files with missing lines Patch % Lines
cmd/waza/cmd_eval.go 87.77% 14 Missing and 8 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #94   +/-   ##
=======================================
  Coverage        ?   72.82%           
=======================================
  Files           ?      130           
  Lines           ?    14816           
  Branches        ?        0           
=======================================
  Hits            ?    10790           
  Misses          ?     3221           
  Partials        ?      805           
Flag Coverage Δ
go-implementation 72.82% <87.84%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new waza eval new <skill-name> CLI subcommand to scaffold an evaluation suite from a skill’s SKILL.md frontmatter, plus docs/tests to make it discoverable and verifiable.

Changes:

  • Introduce waza eval new command that parses SKILL.md triggers and generates eval.yaml + starter trigger/anti-trigger task YAMLs.
  • Add unit tests validating scaffold generation, custom output path behavior, and missing SKILL.md error handling.
  • Update CLI docs + README + command metadata expectations to include the new eval top-level command.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
site/src/content/docs/reference/cli.mdx Documents the new waza eval new command in the CLI reference.
cmd/waza/root.go Registers the new eval command in the root CLI.
cmd/waza/cmd_metadata_test.go Updates metadata test to expect the new top-level eval command.
cmd/waza/cmd_eval_test.go Adds tests covering scaffold generation, --output, and error cases.
cmd/waza/cmd_eval.go Implements waza eval new scaffold generation logic.
README.md Adds usage docs for waza eval new.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md Adds squad session log (non-functional change).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md Adds squad session log (non-functional change).
.squad/decisions.md Records squad decisions (non-functional change).
Comments suppressed due to low confidence (2)

cmd/waza/cmd_eval.go:74

  • When --output is not provided, the default output path is hard-coded to evals/<skill-name>/eval.yaml. If the project uses a custom paths.evals in .waza.yaml, this will write scaffolding into the wrong directory. It would be more consistent with other commands to derive the default evals directory from project config/workspace detection.
	if outputPath == "" {
		outputPath = filepath.Join("evals", skillName, "eval.yaml")
	}
	tasksDir := filepath.Join(filepath.Dir(outputPath), "tasks")

site/src/content/docs/reference/cli.mdx:202

  • Flag docs list --output without indicating it takes a path argument. For consistency with the README and the actual flag help, consider documenting it as --output <path>.
| Flag | Description |
|------|-------------|
| `--output` | Custom path for generated `eval.yaml` |

Comment thread cmd/waza/cmd_eval.go
Comment thread site/src/content/docs/reference/cli.mdx
Comment thread cmd/waza/cmd_eval_test.go Outdated

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — Rusty. waza eval new is a clean scaffolding command. Good use of SKILL.md frontmatter parsing for positive/negative trigger generation. extractKeywords with stop words is smart. Tests validate generated YAML through validation.ValidateEvalBytes/ValidateTaskBytes — nice. README + CLI reference updated. Ship it. (Self-authored PR — cannot self-approve via API.)

@spboyer spboyer force-pushed the squad/83-eval-new branch from 79893e4 to 02fb261 Compare March 5, 2026 17:12
spboyer added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 5, 2026 17:38
spboyer added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/83-eval-new branch from f016c5f to c00d9c3 Compare March 5, 2026 17:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread site/src/content/docs/reference/cli.mdx Outdated
Comment thread README.md
Comment thread cmd/waza/cmd_eval.go
spboyer added a commit that referenced this pull request Mar 5, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 6, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 6, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/83-eval-new branch from c28fc26 to 7b73e2d Compare March 6, 2026 00:04
Copilot AI review requested due to automatic review settings March 6, 2026 00:04

@wbreza wbreza left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #94 - feat: Add eval scaffolding command (waza eval new)

What Looks Good

  • Smart SKILL.md resolution: workspace-aware detection with .waza.yaml config, fallback to conventional paths
  • Safe overwrite protection and partial write cleanup
  • Generated YAML validated in tests via ValidateEvalBytes/ValidateTaskBytes
  • Custom output path works correctly
  • Docs fully updated

Discussion: Command Overlap

Before approving, I want to flag a naming/scope discussion:

  1. waza new already creates a skill + eval scaffold together
  2. waza eval new (this PR) creates eval scaffold from an existing skill
  3. waza init detects skills missing evals and can create them following config

Questions worth resolving:

  • Should this be waza new eval (verb-noun pattern) instead of waza eval new? The verb-noun pattern (waza new [skill|eval]) is more discoverable and consistent.
  • Does waza init's gap-filling already cover the use case of adding evals to existing skills? If so, is a separate command needed or would init --skill=name suffice?
  • If we keep eval as a subcommand group, what other subcommands go under it? If it's just new, the group may be premature.

Findings

Medium:

  1. Hardcoded 2 positive + 1 negative task structure may not suit skills with rich USE FOR sections (5+ phrases) or minimal descriptions.

Low:
2. Site docs use [skill-name] (optional) instead of (required).
3. extractKeywords stop words overlap with trigger_grader.go stop words.

Summary

Priority Count
Critical 0
High 0
Medium 1
Low 2

Overall Assessment: Comment - implementation is solid, but the command naming and overlap with waza new and waza init should be discussed before merging.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

cmd/waza/cmd_eval.go:74

  • There’s test coverage for custom paths.skills, but no test verifying that the default output location respects paths.evals from .waza.yaml (or that output is anchored to the detected workspace root when invoked from inside a skill directory). Adding a regression test around the default outputPath behavior would help prevent generating evals in the wrong directory structure.
	if outputPath == "" {
		outputPath = filepath.Join("evals", skillName, "eval.yaml")
	}

Comment thread cmd/waza/cmd_eval.go
Comment thread cmd/waza/cmd_eval.go
spboyer added a commit that referenced this pull request Mar 10, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 10, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer

spboyer commented Mar 10, 2026

Copy link
Copy Markdown
Member Author

Should this go in init or new instead of a new verb?

@chlowell - good point. waza new eval ?

spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 10, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit to spboyer/waza-fk that referenced this pull request Mar 10, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/83-eval-new branch from 7b73e2d to ea4edd6 Compare March 10, 2026 16:15
Copilot AI review requested due to automatic review settings March 10, 2026 16:26
@chlowell

Copy link
Copy Markdown
Member

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread cmd/waza/cmd_metadata_test.go Outdated
Comment thread cmd/waza/cmd_new.go
@spboyer

spboyer commented Mar 10, 2026

Copy link
Copy Markdown
Member Author

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

We should consider merging init and new

cc: @richardpark-msft

spboyer and others added 5 commits March 10, 2026 12:40
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…back

Addresses @chlowell's feedback to use 'waza new eval' instead of a
separate 'eval' top-level verb. The eval subcommand is now registered
under the existing 'new' command:

  waza new [skill-name]       → create full skill + eval (unchanged)
  waza new eval <skill-name>  → scaffold eval-only for existing skill

- Remove top-level 'eval' command wrapper
- Register newEvalNewCommand() under newNewCommand()
- Update all tests, README, and CLI docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/83-eval-new branch from 5d722be to 37f86f1 Compare March 10, 2026 16:43
@spboyer

spboyer commented Mar 10, 2026

Copy link
Copy Markdown
Member Author

I like waza new eval so we can logically extend new with additional resource types. There's still overlap with waza init though, which can also generate eval.yaml. Does init need an update to match new behavior here?

I made this change for now

@github-actions github-actions Bot merged commit f3371ce into microsoft:main Mar 10, 2026
6 checks passed
@spboyer spboyer mentioned this pull request Mar 12, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Eval scaffolding command — waza eval new

6 participants