fix: validate grader config required fields#195
Conversation
Grader configurations now validate type-specific required fields at parse time: - code graders require at least one assertion in config.assertions - diff graders require at least one file in config.expected_files - json_schema graders require config.schema or config.schema_file - program graders require config.command - trigger graders require config.skill_path - action_sequence graders require config.expected_actions - skill_invocation graders require config.required_skills - tool_constraint graders require config.expect_tools or config.reject_tools - file graders require at least one of must_exist, must_not_exist, or content_patterns The strict YAML parser (KnownFields) already catches fields at the wrong nesting level. This change adds semantic validation to catch graders with empty/missing required fields. Validation is enforced in both GraderConfig (spec-level) and ValidatorInline (task-level) graders via their UnmarshalYAML methods. Fixes #113 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds explicit validation for grader/validator configs during YAML unmarshalling so obvious misconfigurations (e.g., missing required fields per grader type) fail fast instead of being silently ignored, and updates demo documentation accordingly.
Changes:
- Add
Validate()methods forGraderConfigandValidatorInline, invoked duringUnmarshalYAML. - Add unit tests covering required-field validation failures across multiple grader kinds.
- Revise
DEMO-SCRIPT.mdworkflow/CLI examples.
Show a summary per file
| File | Description |
|---|---|
| internal/models/testcase.go | Validates task-level inline validators (graders) after YAML decode. |
| internal/models/spec.go | Validates eval-level grader configs after YAML decode. |
| internal/models/grader_validation_test.go | New tests asserting validation errors for missing required fields. |
| DEMO-SCRIPT.md | Updates demo flow and CLI examples (needs alignment with actual CLI behavior). |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 5
| if !ok { | ||
| return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", v.Identifier, v.Parameters) | ||
| } | ||
| if len(params.Assertions) == 0 { |
There was a problem hiding this comment.
For code validators, users can still supply top-level assertions (populates v.Checks), but this validation only checks config.assertions. That means someone who put assertions at the wrong level will get an error that implies they provided none, even though v.Checks is non-empty. Consider detecting len(v.Checks)>0 && len(params.Assertions)==0 and returning a more specific message like “assertions must be under config.assertions (top-level assertions are ignored)”.
| if len(params.Assertions) == 0 { | |
| if len(params.Assertions) == 0 { | |
| if len(v.Checks) > 0 { | |
| return fmt.Errorf("code grader %q: assertions must be under config.assertions (top-level assertions are ignored)", v.Identifier) | |
| } |
| // Validate checks that the grader config has required fields for its type. | ||
| func (g *GraderConfig) Validate() error { | ||
| switch g.Kind { | ||
| case GraderKindInlineScript: | ||
| params, ok := g.Parameters.(InlineScriptGraderParameters) | ||
| if !ok { | ||
| return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", g.Identifier, g.Parameters) | ||
| } | ||
| if len(params.Assertions) == 0 { | ||
| return fmt.Errorf("code grader %q: must have at least one assertion in config.assertions", g.Identifier) | ||
| } | ||
|
|
There was a problem hiding this comment.
The grader required-field validation logic is duplicated here and in ValidatorInline.Validate (internal/models/testcase.go). To avoid the two drifting over time when grader kinds/fields change, consider extracting a shared helper (e.g., validateGraderRequiredFields(kind, identifier, params)) and calling it from both places.
| **Expected Output:** | ||
| ``` | ||
| ✓ Created skill: code-reviewer | ||
|
|
||
| Structure created: | ||
| skills/code-reviewer/ | ||
| ├── SKILL.md | ||
| ├── index.ts (or index.py) | ||
| └── code-reviewer.agent.json (if using Azure SDK) | ||
|
|
||
| > "Notice it created **fixtures/** with sample Azure Functions code. This gives the skill real context to work with." | ||
|
|
||
| --- | ||
|
|
||
| ## Part 2b: LLM-Assisted Generation (2 min) ⭐ NEW | ||
|
|
||
| ### Talking Points | ||
| Next: Create an eval suite | ||
| waza new eval code-reviewer |
There was a problem hiding this comment.
The “Expected Output” for waza new skill code-reviewer doesn’t match what the CLI currently prints or scaffolds. waza new skill prints “🔧 Scaffolding skill …” plus an inventory (and creates SKILL.md + evals/ + tasks/ + fixtures/ + workflow/README/.gitignore), not index.ts / code-reviewer.agent.json. Please update this block to reflect the actual scaffold/output (or avoid exact output if it’s meant to be illustrative).
| **Expected Output:** | ||
| ``` | ||
| waza v0.3.0 | ||
|
|
||
| Scanning GitHub repository: microsoft/GitHub-Copilot-for-Azure | ||
| ✓ Found 15 skill(s) | ||
| ✓ Created eval suite: code-reviewer-eval | ||
|
|
||
| Parsing: azure-functions | ||
| ✓ Parsed skill: azure-functions | ||
| Description: Build and deploy serverless Azure Functions... | ||
|
|
||
| Using LLM-assisted generation with claude-sonnet-4-20250514... | ||
|
|
||
| ⠋ Generating tasks... | ||
| ✓ Generated 5 tasks | ||
| ⠋ Generating fixtures... | ||
| ✓ Generated 4 fixtures | ||
| ⠋ Suggesting graders... | ||
| ✓ Suggested 3 graders | ||
|
|
||
| ✓ Created tasks/task-001.yaml (Deploy HTTP Trigger Function) | ||
| ✓ Created tasks/task-002.yaml (Create Timer-based Function) | ||
| ✓ Created tasks/task-003.yaml (Debug Cold Start Issues) | ||
| ✓ Created tasks/task-004.yaml (Configure Function App Settings) | ||
| ✓ Created tasks/task-005.yaml (Set Up CI/CD for Functions) | ||
| ✓ Created fixtures/function_app.py | ||
| ✓ Created fixtures/host.json | ||
| ✓ Created fixtures/local.settings.json | ||
| ✓ Created fixtures/requirements.txt | ||
| ✓ Created eval.yaml | ||
| ✓ Created trigger_tests.yaml | ||
|
|
||
| ╭────────────────────── ✓ Success ──────────────────────╮ | ||
| │ Generated eval suite at: azure-functions-eval-assisted │ | ||
| │ │ | ||
| │ Run with: │ | ||
| │ waza run azure-functions-eval-assisted/eval.yaml │ | ||
| ╰────────────────────────────────────────────────────────╯ | ||
| ``` | ||
|
|
||
| ### Compare the Tasks | ||
|
|
||
| ```bash | ||
| # Pattern-based task | ||
| cat azure-functions-eval/tasks/task-001.yaml | ||
| Structure created: | ||
| evals/code-reviewer-eval/ | ||
| ├── eval.yaml | ||
| ├── tasks/ | ||
| │ └── task-001.yaml | ||
| ├── fixtures/ | ||
| │ └── example.txt | ||
| └── graders/ | ||
| └── custom_grader.py |
There was a problem hiding this comment.
The “Expected Output” / directory structure for waza new eval code-reviewer looks incorrect for the current implementation. waza new eval scaffolds evals/<skill>/eval.yaml plus a few tasks/positive-trigger-*.yaml files and prints “✅ Eval scaffold created for …” with file paths; it does not create fixtures/ or graders/ directories. Please align this section with the actual command behavior/output.
| graders: | ||
| - name: found_issues | ||
| type: code | ||
| type: regex | ||
| config: | ||
| assertions: | ||
| - "len(output) > 50" | ||
| - "'improve' in output.lower() or 'suggest' in output.lower() or 'issue' in output.lower()" | ||
| pattern: '(improve|suggest|issue)' |
There was a problem hiding this comment.
This task example uses type: regex with config.pattern, but the supported grader kind is type: text and the config fields are contains/not_contains/regex_match/regex_not_match (no pattern field). As written, this YAML won’t decode into a supported grader parameters type. Update the example to type: text + regex_match: ['(improve|suggest|issue)'] (or similar).
See below for a potential fix:
type: text
config:
regex_match:
- '(improve|suggest|issue)'
…tion Test fixtures for coverage report tests used file and diff graders without required config fields, causing parse failures after the grader config validation added in #113. Updated fixtures to include valid config (must_exist and expected_files respectively). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The grader config validation from PR #195 correctly rejects code graders without assertions. Updated the test eval YAML to include a config.assertions field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed two CI issues introduced by this PR:
The remaining CI failure (\TestHandler_EvalGet_Success\ in \internal/jsonrpc) is pre-existing on \main\ and unrelated to this PR. |
Fixes #113