Add in custom YAML deserializers for our config by richardpark-msft · Pull Request #106 · microsoft/waza

richardpark-msft · 2026-03-10T20:19:47Z

Changes our YAML serialization/deserialization to be done one-way. This is a change I need for another place, where I auto-generate YAML files and want to make sure all the schemas are consistent.

…f their own custom args types.

Copilot

Pull request overview

This PR refactors grader configuration loading to use typed, polymorphic YAML decoding (models.GraderParameters) instead of map[string]any + mapstructure, and updates grader constructors/call sites accordingly to support one-way, schema-consistent YAML generation.

Changes:

Add typed grader parameter structs and polymorphic YAML decoding for eval and task grader configs.
Simplify graders.Create and grader constructors to accept typed parameter structs.
Update tests and orchestration runner logic to use the new typed config flow.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
internal/models/grader_params.go	Introduces typed `GraderParameters` and decoding helpers for per-grader YAML config payloads.
internal/models/grader_params_test.go	Adds coverage for polymorphic decode behavior for spec graders and task validators.
internal/models/spec.go	Switches spec grader config from `map[string]any` to typed `GraderParameters` via custom YAML unmarshal.
internal/models/testcase.go	Switches task inline validator config from `map[string]any` to typed `GraderParameters` via custom YAML unmarshal.
internal/graders/grader.go	Replaces mapstructure-based factory with type-switch on `models.GraderParameters`.
internal/orchestration/runner.go	Updates grader creation and applies defaults via typed parameters.
internal/graders/* + *_test.go	Updates individual graders/tests to accept typed parameter structs.
internal/orchestration/runner_orchestration_test.go	Updates orchestration tests to use typed parameter structs.
cmd/waza/cmd_run_suggest_test.go	Updates test grader params to match typed schema (`contains` as `[]string`).
go.mod	Moves lipgloss to indirect dependency.
internal/orchestration/judge_model_test.go	Removes injectJudgeModel unit tests (function deleted).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

… wz-yaml-fixes

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 3 comments.

into wz-yaml-fixes

…hardcoded.

Copilot

Pull request overview

Copilot reviewed 35 out of 36 changed files in this pull request and generated 4 comments.

codecov-commenter · 2026-03-10T23:43:56Z

Codecov Report

❌ Patch coverage is 67.35751% with 63 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@f3371ce). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/models/grader_params.go	34.78%	28 Missing and 2 partials ⚠️
internal/graders/grader.go	56.00%	11 Missing ⚠️
internal/graders/diff_grader.go	50.00%	4 Missing ⚠️
internal/models/spec.go	82.60%	2 Missing and 2 partials ⚠️
internal/models/testcase.go	80.95%	2 Missing and 2 partials ⚠️
internal/orchestration/runner.go	75.00%	4 Missing ⚠️
internal/execution/copilot_client_wrappers.go	0.00%	2 Missing ⚠️
internal/graders/inline_script_grader.go	75.00%	1 Missing and 1 partial ⚠️
internal/execution/copilot.go	0.00%	0 Missing and 1 partial ⚠️
internal/graders/prompt_grader.go	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #106   +/-   ##
=======================================
  Coverage        ?   72.97%           
=======================================
  Files           ?      131           
  Lines           ?    14817           
  Branches        ?        0           
=======================================
  Hits            ?    10812           
  Misses          ?     3204           
  Partials        ?      801

Flag	Coverage Δ
go-implementation	`72.97% <67.35%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…rosoft#160) Closes microsoft#106 Adapts Azure ML's tool_call evaluator rubrics (tool_call_accuracy, tool_selection, tool_input_accuracy, tool_output_utilization) as waza-compatible YAML configs for the prompt grader. ## What's included | File | Evaluates | Scale | |------|-----------|-------| | `tool_call_accuracy.yaml` | Overall tool call effectiveness | 1–5 ordinal → 0.0–1.0 | | `tool_selection.yaml` | Right tools chosen, none missed | Binary → 0.0/1.0 | | `tool_input_accuracy.yaml` | Parameter correctness | Binary → 0.0/1.0 | | `tool_output_utilization.yaml` | Correct use of tool results | Binary → 0.0/1.0 | | `README.md` | Usage guide and rubric structure docs | — | ## Dependencies These are config artifacts (YAML + docs). They become usable once the `prompt` grader (microsoft#104) merges. ## Source Adapted from [Azure ML built-in evaluators](https://github.com/Azure/azureml-assets/tree/main/assets/evaluators/builtin) (MIT License). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…t#161) Closes microsoft#107 Adapts Azure ML's task evaluation rubrics as waza-compatible YAML configs for the prompt grader. ## Rubrics added | Rubric | Score type | Scale | Description | |--------|-----------|-------|-------------| | `task_adherence` | binary flag | 0.0 / 1.0 | 3-dimension eval (goal/rule/procedure); flagged=true on any material failure | | `task_completion` | binary | 0.0 / 1.0 | Was the task fully completed? Outcome-focused | | `intent_resolution` | ordinal 1-5 | 0.0–1.0 | How well did the agent resolve the user's intent? | | `response_completeness` | ordinal 1-5 | 0.0–1.0 | How thoroughly does the response cover ground truth? | ## Structure Each rubric YAML includes: - `evaluation_criteria` — detailed rubric text adapted from Azure ML `.prompty` files - `rating_levels` — scoring scale with descriptions - `score_normalization` — raw score → 0.0-1.0 mapping - `input_mapping` — waza graders.Context → rubric input mapping - `chain_of_thought` — step-by-step LLM judge instructions ## Source Adapted from [Azure/azure-sdk-for-python](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators) evaluators: - `TaskAdherenceEvaluator` - `TaskCompletionEvaluator` - `IntentResolutionEvaluator` - `ResponseCompletenessEvaluator` > Note: The `examples/rubrics/README.md` is being created separately in microsoft#106. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Richard Park added 4 commits March 10, 2026 19:46

All manual changes have been made.

94883ae

Last fixup to get all the other locations to use Parameters instead o…

4ababad

…f their own custom args types.

Remove some more unneeded stuff.

02c0501

go mod tidy

8dfd497

richardpark-msft requested review from chlowell and spboyer as code owners March 10, 2026 20:19

Copilot AI review requested due to automatic review settings March 10, 2026 20:19

github-actions Bot enabled auto-merge (squash) March 10, 2026 20:20

Copilot started reviewing on behalf of richardpark-msft March 10, 2026 20:21 View session

Unneeded convert

43d8eb9

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/models/grader_params.go Outdated

Comment thread internal/models/grader_params.go Outdated

Comment thread internal/graders/file_grader.go Outdated

Comment thread internal/orchestration/runner.go

Apply suggestions from code review

8b53cf9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 10, 2026 20:31

Copilot started reviewing on behalf of richardpark-msft March 10, 2026 20:32 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/models/spec.go

Comment thread internal/models/testcase.go

Comment thread internal/orchestration/runner.go

Comment thread internal/models/grader_params.go

Richard Park added 3 commits March 10, 2026 20:39

Adding in defensive check, as advised by copilot.

1726f1f

Merge remote-tracking branch 'refs/remotes/origin/wz-yaml-fixes' into…

bc911a2

… wz-yaml-fixes

Oops, didn't mean to include that 1/2 test.

65abbad

Copilot AI review requested due to automatic review settings March 10, 2026 20:43

Copilot started reviewing on behalf of richardpark-msft March 10, 2026 20:44 View session

Richard Park and others added 2 commits March 10, 2026 20:48

Remove unused import

8d3aeba

Merge branch 'main' into wz-yaml-fixes

fdcb2f4

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/models/grader_params_test.go

Comment thread internal/graders/file_grader.go Outdated

Comment thread internal/models/grader_params.go

Richard Park added 3 commits March 10, 2026 20:55

Updating copilot SDK, fixing some old syntax.

f303de6

Merge branch 'wz-yaml-fixes' of https://github.com/richardpark-msft/waza

59e9f68

into wz-yaml-fixes

Remove unneeded models, use the constants in an area that used to be …

36a205c

…hardcoded.

Copilot AI review requested due to automatic review settings March 10, 2026 20:59

Copilot started reviewing on behalf of richardpark-msft March 10, 2026 21:00 View session

Swap to disconnect, as indicated by the SDK's deprecation notice.

560c45c

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/webapi/additional_test.go

Comment thread internal/models/events_test.go

Comment thread internal/execution/session_events_collector_test.go

Comment thread internal/orchestration/runner.go

spboyer approved these changes Mar 10, 2026

View reviewed changes

github-actions Bot merged commit d3e8714 into microsoft:main Mar 10, 2026
6 checks passed

richardpark-msft deleted the wz-yaml-fixes branch March 10, 2026 21:12

spboyer mentioned this pull request Feb 28, 2026

🎯 Waza Platform Roadmap - Tracking Issue #8

Closed

Uh oh!

Conversation

richardpark-msft commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 10, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants