Skip to content

fix: validate grader config required fields#195

Merged
spboyer merged 3 commits into
mainfrom
squad/113-grader-config-validation
Apr 21, 2026
Merged

fix: validate grader config required fields#195
spboyer merged 3 commits into
mainfrom
squad/113-grader-config-validation

Conversation

@spboyer

@spboyer spboyer commented Apr 21, 2026

Copy link
Copy Markdown
Member

Fixes #113

Grader configurations now validate type-specific required fields at parse time:
- code graders require at least one assertion in config.assertions
- diff graders require at least one file in config.expected_files
- json_schema graders require config.schema or config.schema_file
- program graders require config.command
- trigger graders require config.skill_path
- action_sequence graders require config.expected_actions
- skill_invocation graders require config.required_skills
- tool_constraint graders require config.expect_tools or config.reject_tools
- file graders require at least one of must_exist, must_not_exist, or content_patterns

The strict YAML parser (KnownFields) already catches fields at the wrong nesting level.
This change adds semantic validation to catch graders with empty/missing required fields.

Validation is enforced in both GraderConfig (spec-level) and ValidatorInline (task-level)
graders via their UnmarshalYAML methods.

Fixes #113

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 21, 2026 14:52
@github-actions github-actions Bot enabled auto-merge (squash) April 21, 2026 14:52
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds explicit validation for grader/validator configs during YAML unmarshalling so obvious misconfigurations (e.g., missing required fields per grader type) fail fast instead of being silently ignored, and updates demo documentation accordingly.

Changes:

  • Add Validate() methods for GraderConfig and ValidatorInline, invoked during UnmarshalYAML.
  • Add unit tests covering required-field validation failures across multiple grader kinds.
  • Revise DEMO-SCRIPT.md workflow/CLI examples.
Show a summary per file
File Description
internal/models/testcase.go Validates task-level inline validators (graders) after YAML decode.
internal/models/spec.go Validates eval-level grader configs after YAML decode.
internal/models/grader_validation_test.go New tests asserting validation errors for missing required fields.
DEMO-SCRIPT.md Updates demo flow and CLI examples (needs alignment with actual CLI behavior).

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 5

if !ok {
return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", v.Identifier, v.Parameters)
}
if len(params.Assertions) == 0 {

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code validators, users can still supply top-level assertions (populates v.Checks), but this validation only checks config.assertions. That means someone who put assertions at the wrong level will get an error that implies they provided none, even though v.Checks is non-empty. Consider detecting len(v.Checks)>0 && len(params.Assertions)==0 and returning a more specific message like “assertions must be under config.assertions (top-level assertions are ignored)”.

Suggested change
if len(params.Assertions) == 0 {
if len(params.Assertions) == 0 {
if len(v.Checks) > 0 {
return fmt.Errorf("code grader %q: assertions must be under config.assertions (top-level assertions are ignored)", v.Identifier)
}

Copilot uses AI. Check for mistakes.
Comment thread internal/models/spec.go
Comment on lines +114 to +125
// Validate checks that the grader config has required fields for its type.
func (g *GraderConfig) Validate() error {
switch g.Kind {
case GraderKindInlineScript:
params, ok := g.Parameters.(InlineScriptGraderParameters)
if !ok {
return fmt.Errorf("code grader %q: expected InlineScriptGraderParameters, got %T", g.Identifier, g.Parameters)
}
if len(params.Assertions) == 0 {
return fmt.Errorf("code grader %q: must have at least one assertion in config.assertions", g.Identifier)
}

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grader required-field validation logic is duplicated here and in ValidatorInline.Validate (internal/models/testcase.go). To avoid the two drifting over time when grader kinds/fields change, consider extracting a shared helper (e.g., validateGraderRequiredFields(kind, identifier, params)) and calling it from both places.

Copilot uses AI. Check for mistakes.
Comment thread DEMO-SCRIPT.md
Comment on lines +79 to +90
**Expected Output:**
```
✓ Created skill: code-reviewer

Structure created:
skills/code-reviewer/
├── SKILL.md
├── index.ts (or index.py)
└── code-reviewer.agent.json (if using Azure SDK)

> "Notice it created **fixtures/** with sample Azure Functions code. This gives the skill real context to work with."

---

## Part 2b: LLM-Assisted Generation (2 min) ⭐ NEW

### Talking Points
Next: Create an eval suite
waza new eval code-reviewer

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Expected Output” for waza new skill code-reviewer doesn’t match what the CLI currently prints or scaffolds. waza new skill prints “🔧 Scaffolding skill …” plus an inventory (and creates SKILL.md + evals/ + tasks/ + fixtures/ + workflow/README/.gitignore), not index.ts / code-reviewer.agent.json. Please update this block to reflect the actual scaffold/output (or avoid exact output if it’s meant to be illustrative).

Copilot uses AI. Check for mistakes.
Comment thread DEMO-SCRIPT.md
Comment on lines 101 to +113
**Expected Output:**
```
waza v0.3.0

Scanning GitHub repository: microsoft/GitHub-Copilot-for-Azure
✓ Found 15 skill(s)
✓ Created eval suite: code-reviewer-eval

Parsing: azure-functions
✓ Parsed skill: azure-functions
Description: Build and deploy serverless Azure Functions...

Using LLM-assisted generation with claude-sonnet-4-20250514...

⠋ Generating tasks...
✓ Generated 5 tasks
⠋ Generating fixtures...
✓ Generated 4 fixtures
⠋ Suggesting graders...
✓ Suggested 3 graders

✓ Created tasks/task-001.yaml (Deploy HTTP Trigger Function)
✓ Created tasks/task-002.yaml (Create Timer-based Function)
✓ Created tasks/task-003.yaml (Debug Cold Start Issues)
✓ Created tasks/task-004.yaml (Configure Function App Settings)
✓ Created tasks/task-005.yaml (Set Up CI/CD for Functions)
✓ Created fixtures/function_app.py
✓ Created fixtures/host.json
✓ Created fixtures/local.settings.json
✓ Created fixtures/requirements.txt
✓ Created eval.yaml
✓ Created trigger_tests.yaml

╭────────────────────── ✓ Success ──────────────────────╮
│ Generated eval suite at: azure-functions-eval-assisted │
│ │
│ Run with: │
│ waza run azure-functions-eval-assisted/eval.yaml │
╰────────────────────────────────────────────────────────╯
```

### Compare the Tasks

```bash
# Pattern-based task
cat azure-functions-eval/tasks/task-001.yaml
Structure created:
evals/code-reviewer-eval/
├── eval.yaml
├── tasks/
│ └── task-001.yaml
├── fixtures/
│ └── example.txt
└── graders/
└── custom_grader.py

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Expected Output” / directory structure for waza new eval code-reviewer looks incorrect for the current implementation. waza new eval scaffolds evals/<skill>/eval.yaml plus a few tasks/positive-trigger-*.yaml files and prints “✅ Eval scaffold created for …” with file paths; it does not create fixtures/ or graders/ directories. Please align this section with the actual command behavior/output.

Copilot uses AI. Check for mistakes.
Comment thread DEMO-SCRIPT.md
Comment on lines 243 to +247
graders:
- name: found_issues
type: code
type: regex
config:
assertions:
- "len(output) > 50"
- "'improve' in output.lower() or 'suggest' in output.lower() or 'issue' in output.lower()"
pattern: '(improve|suggest|issue)'

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task example uses type: regex with config.pattern, but the supported grader kind is type: text and the config fields are contains/not_contains/regex_match/regex_not_match (no pattern field). As written, this YAML won’t decode into a supported grader parameters type. Update the example to type: text + regex_match: ['(improve|suggest|issue)'] (or similar).

See below for a potential fix:

    type: text
    config:
      regex_match:
        - '(improve|suggest|issue)'

Copilot uses AI. Check for mistakes.
…tion

Test fixtures for coverage report tests used file and diff graders
without required config fields, causing parse failures after the
grader config validation added in #113. Updated fixtures to include
valid config (must_exist and expected_files respectively).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer merged commit 900c10f into main Apr 21, 2026
4 of 6 checks passed
@spboyer spboyer deleted the squad/113-grader-config-validation branch April 21, 2026 15:09
spboyer pushed a commit that referenced this pull request Apr 21, 2026
The grader config validation from PR #195 correctly rejects code
graders without assertions. Updated the test eval YAML to include
a config.assertions field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer

spboyer commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

Addressed two CI issues introduced by this PR:

  1. gofmt formatting — \internal/models/spec.go\ and \internal/models/testcase.go\ were not properly formatted. Applied \gofmt -w\ (committed in a prior commit on this branch).

  2. Test fixtures — \cmd/waza/cmd_coverage_test.go\ fixtures used \ ile\ and \diff\ graders without the required config fields (\must_exist/\�xpected_files), causing eval parse failures in coverage report tests. Updated fixtures to include valid minimal config.

The remaining CI failure (\TestHandler_EvalGet_Success\ in \internal/jsonrpc) is pre-existing on \main\ and unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grader configuration can be incorrect with no visible error

3 participants