fix: validate grader config required fields by spboyer · Pull Request #195 · microsoft/waza

spboyer · 2026-04-21T14:52:18Z

Fixes #113

Grader configurations now validate type-specific required fields at parse time: - code graders require at least one assertion in config.assertions - diff graders require at least one file in config.expected_files - json_schema graders require config.schema or config.schema_file - program graders require config.command - trigger graders require config.skill_path - action_sequence graders require config.expected_actions - skill_invocation graders require config.required_skills - tool_constraint graders require config.expect_tools or config.reject_tools - file graders require at least one of must_exist, must_not_exist, or content_patterns The strict YAML parser (KnownFields) already catches fields at the wrong nesting level. This change adds semantic validation to catch graders with empty/missing required fields. Validation is enforced in both GraderConfig (spec-level) and ValidatorInline (task-level) graders via their UnmarshalYAML methods. Fixes #113 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds explicit validation for grader/validator configs during YAML unmarshalling so obvious misconfigurations (e.g., missing required fields per grader type) fail fast instead of being silently ignored, and updates demo documentation accordingly.

Changes:

Add Validate() methods for GraderConfig and ValidatorInline, invoked during UnmarshalYAML.
Add unit tests covering required-field validation failures across multiple grader kinds.
Revise DEMO-SCRIPT.md workflow/CLI examples.

Show a summary per file

File	Description
internal/models/testcase.go	Validates task-level inline validators (graders) after YAML decode.
internal/models/spec.go	Validates eval-level grader configs after YAML decode.
internal/models/grader_validation_test.go	New tests asserting validation errors for missing required fields.
DEMO-SCRIPT.md	Updates demo flow and CLI examples (needs alignment with actual CLI behavior).

Copilot's findings

Files reviewed: 4/4 changed files
Comments generated: 5

Copilot · 2026-04-21T15:03:10Z

+		if !ok {
+			return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", v.Identifier, v.Parameters)
+		}
+		if len(params.Assertions) == 0 {


For code validators, users can still supply top-level assertions (populates v.Checks), but this validation only checks config.assertions. That means someone who put assertions at the wrong level will get an error that implies they provided none, even though v.Checks is non-empty. Consider detecting len(v.Checks)>0 && len(params.Assertions)==0 and returning a more specific message like “assertions must be under config.assertions (top-level assertions are ignored)”.

Suggested change

if len(params.Assertions) == 0 {

if len(params.Assertions) == 0 {

if len(v.Checks) > 0 {

return fmt.Errorf("code grader %q: assertions must be under config.assertions (top-level assertions are ignored)", v.Identifier)

}

Copilot · 2026-04-21T15:03:11Z

+// Validate checks that the grader config has required fields for its type.
+func (g *GraderConfig) Validate() error {
+	switch g.Kind {
+	case GraderKindInlineScript:
+		params, ok := g.Parameters.(InlineScriptGraderParameters)
+		if !ok {
+			return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", g.Identifier, g.Parameters)
+		}
+		if len(params.Assertions) == 0 {
+			return fmt.Errorf("code grader %q: must have at least one assertion in config.assertions", g.Identifier)
+		}
+


The grader required-field validation logic is duplicated here and in ValidatorInline.Validate (internal/models/testcase.go). To avoid the two drifting over time when grader kinds/fields change, consider extracting a shared helper (e.g., validateGraderRequiredFields(kind, identifier, params)) and calling it from both places.

Copilot · 2026-04-21T15:03:11Z

+**Expected Output:**
 ```
+✓ Created skill: code-reviewer
+
+Structure created:
+  skills/code-reviewer/
+  ├── SKILL.md
+  ├── index.ts (or index.py)
+  └── code-reviewer.agent.json (if using Azure SDK)

-> "Notice it created **fixtures/** with sample Azure Functions code. This gives the skill real context to work with."
-
---
-
-## Part 2b: LLM-Assisted Generation (2 min) ⭐ NEW
-
-### Talking Points
+Next: Create an eval suite
+  waza new eval code-reviewer


The “Expected Output” for waza new skill code-reviewer doesn’t match what the CLI currently prints or scaffolds. waza new skill prints “🔧 Scaffolding skill …” plus an inventory (and creates SKILL.md + evals/ + tasks/ + fixtures/ + workflow/README/.gitignore), not index.ts / code-reviewer.agent.json. Please update this block to reflect the actual scaffold/output (or avoid exact output if it’s meant to be illustrative).

Copilot · 2026-04-21T15:03:12Z

 **Expected Output:**
 ```
-waza v0.3.0
-
-Scanning GitHub repository: microsoft/GitHub-Copilot-for-Azure
-✓ Found 15 skill(s)
+✓ Created eval suite: code-reviewer-eval

-Parsing: azure-functions
-✓ Parsed skill: azure-functions
-  Description: Build and deploy serverless Azure Functions...
-
-Using LLM-assisted generation with claude-sonnet-4-20250514...
-
-⠋ Generating tasks...
-✓ Generated 5 tasks
-⠋ Generating fixtures...
-✓ Generated 4 fixtures
-⠋ Suggesting graders...
-✓ Suggested 3 graders
-
-✓ Created tasks/task-001.yaml (Deploy HTTP Trigger Function)
-✓ Created tasks/task-002.yaml (Create Timer-based Function)
-✓ Created tasks/task-003.yaml (Debug Cold Start Issues)
-✓ Created tasks/task-004.yaml (Configure Function App Settings)
-✓ Created tasks/task-005.yaml (Set Up CI/CD for Functions)
-✓ Created fixtures/function_app.py
-✓ Created fixtures/host.json
-✓ Created fixtures/local.settings.json
-✓ Created fixtures/requirements.txt
-✓ Created eval.yaml
-✓ Created trigger_tests.yaml
-
-╭────────────────────── ✓ Success ──────────────────────╮
-│ Generated eval suite at: azure-functions-eval-assisted │
-│                                                        │
-│ Run with:                                              │
-│   waza run azure-functions-eval-assisted/eval.yaml │
-╰────────────────────────────────────────────────────────╯
-```
-
-### Compare the Tasks
-
-```bash
-# Pattern-based task
-cat azure-functions-eval/tasks/task-001.yaml
+Structure created:
+  evals/code-reviewer-eval/
+  ├── eval.yaml
+  ├── tasks/
+  │   └── task-001.yaml
+  ├── fixtures/
+  │   └── example.txt
+  └── graders/
+      └── custom_grader.py


The “Expected Output” / directory structure for waza new eval code-reviewer looks incorrect for the current implementation. waza new eval scaffolds evals/<skill>/eval.yaml plus a few tasks/positive-trigger-*.yaml files and prints “✅ Eval scaffold created for …” with file paths; it does not create fixtures/ or graders/ directories. Please align this section with the actual command behavior/output.

Copilot · 2026-04-21T15:03:12Z

 graders:
  - name: found_issues
-    type: code
+    type: regex
    config:
-      assertions:
-        - "len(output) > 50"
-        - "'improve' in output.lower() or 'suggest' in output.lower() or 'issue' in output.lower()"
+      pattern: '(improve|suggest|issue)'


This task example uses type: regex with config.pattern, but the supported grader kind is type: text and the config fields are contains/not_contains/regex_match/regex_not_match (no pattern field). As written, this YAML won’t decode into a supported grader parameters type. Update the example to type: text + regex_match: ['(improve|suggest|issue)'] (or similar).

See below for a potential fix:

type: text config: regex_match: - '(improve|suggest|issue)'

…tion Test fixtures for coverage report tests used file and diff graders without required config fields, causing parse failures after the grader config validation added in #113. Updated fixtures to include valid config (must_exist and expected_files respectively). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The grader config validation from PR #195 correctly rejects code graders without assertions. Updated the test eval YAML to include a config.assertions field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-04-21T15:17:14Z

Addressed two CI issues introduced by this PR:

gofmt formatting — \internal/models/spec.go\ and \internal/models/testcase.go\ were not properly formatted. Applied \gofmt -w\ (committed in a prior commit on this branch).
Test fixtures — \cmd/waza/cmd_coverage_test.go\ fixtures used \ile\ and \diff\ graders without the required config fields (\must_exist/\�xpected_files), causing eval parse failures in coverage report tests. Updated fixtures to include valid minimal config.

The remaining CI failure (\TestHandler_EvalGet_Success\ in \internal/jsonrpc) is pre-existing on \main\ and unrelated to this PR.

Copilot AI review requested due to automatic review settings April 21, 2026 14:52

spboyer requested review from chlowell, richardpark-msft and wbreza as code owners April 21, 2026 14:52

github-actions Bot enabled auto-merge (squash) April 21, 2026 14:52

Copilot started reviewing on behalf of spboyer April 21, 2026 14:53 View session

fix: run gofmt on spec.go and testcase.go

0e7150c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed Apr 21, 2026

View reviewed changes

spboyer merged commit 900c10f into main Apr 21, 2026
4 of 6 checks passed

spboyer deleted the squad/113-grader-config-validation branch April 21, 2026 15:09

spboyer mentioned this pull request Apr 21, 2026

Grader configuration can be incorrect with no visible error #113

Closed

spboyer mentioned this pull request Apr 21, 2026

fix: update EvalGet test fixture for code grader validation #197

Merged

spboyer mentioned this pull request Feb 28, 2026

🎯 Waza Platform Roadmap - Tracking Issue #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: validate grader config required fields#195

fix: validate grader config required fields#195
spboyer merged 3 commits into
mainfrom
squad/113-grader-config-validation

spboyer commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Uh oh!

spboyer commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spboyer commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants