feat: add output_contains_any expectation field by spboyer · Pull Request #203 · microsoft/waza

spboyer · 2026-04-21T17:19:29Z

Summary

Adds MayInclude (output_contains_any) to TestExpectation, which passes when any of the listed strings appear in the agent output. This completes the expectation-level text check trio:

YAML field	Go field	Semantics
`output_contains`	`MustInclude`	ALL strings must appear (score = matched/total)
`output_not_contains`	`MustExclude`	NONE may appear (score = absent/total)
`output_contains_any`	`MayInclude`	ANY one must appear (binary 1.0/0.0)

What changed

internal/models/testcase.go — Added MayInclude []string field with yaml/json tags
internal/graders/run.go — Added evaluateExpectations() that evaluates all three expectation fields and synthesizes GraderResults. Wired into RunAll() after spec/task graders. All checks are case-insensitive.
internal/graders/run_test.go — 4 new test functions covering each field individually and combined
internal/models/testcase_test.go — YAML parsing test for the new field

Example YAML

expected:
  output_contains_any:
    - "option_a"
    - "option_b"
    - "option_c"

Note

MustInclude and MustExclude were previously defined in the struct but never evaluated. This PR wires up all three fields.

Working as Linus (Backend Developer)

Closes #137

Copilot

Pull request overview

Adds first-class support for an expectation-level “any-of” output text check (output_contains_any → MayInclude) and wires expectation-based text validations into the grading pipeline alongside existing spec/task graders.

Changes:

Add MayInclude []string to models.TestExpectation with YAML/JSON tags.
Evaluate MustInclude, MustExclude, and MayInclude in graders.RunAll() via a new evaluateExpectations() helper.
Add unit tests for YAML parsing and expectation evaluation behavior.

Show a summary per file

File	Description
internal/models/testcase.go	Extends `TestExpectation` with `MayInclude` (`output_contains_any`).
internal/graders/run.go	Adds expectation evaluation and merges synthesized results into `RunAll()` output.
internal/graders/run_test.go	Adds tests covering each expectation field and combined behavior.
internal/models/testcase_test.go	Adds YAML parsing test for `output_contains_any`.

Copilot's findings

Comments suppressed due to low confidence (2)

internal/graders/run.go:114

These synthetic expectation results don’t set GraderResults.Type. type is required in JSON output (no omitempty) and is used in reporting (e.g., JUnit/web API). Please populate Type (likely models.GraderKindText) for this result.

results["_output_not_contains"] = models.GraderResults{
Name:     "_output_not_contains",
Score:    score,
Passed:   score == 1.0,
Feedback: feedback,
Weight:   1.0,
}

internal/graders/run.go:139

These synthetic expectation results don’t set GraderResults.Type. type is required in JSON output (no omitempty) and is used in reporting (e.g., JUnit/web API). Please populate Type (likely models.GraderKindText) for this result.

results["_output_contains_any"] = models.GraderResults{
Name:     "_output_contains_any",
Score:    score,
Passed:   foundAny,
Feedback: feedback,
Weight:   1.0,
}

Files reviewed: 6/6 changed files
Comments generated: 4

Copilot

Copilot's findings

Comments suppressed due to low confidence (2)

internal/graders/run.go:112

The synthetic expectation-derived GraderResults for output_not_contains don’t set Type. Since Type is required and surfaced in reports, populate it (likely models.GraderKindText) here too.

		results["_output_not_contains"] = models.GraderResults{
			Name:     "_output_not_contains",
			Score:    score,
			Passed:   score == 1.0,
			Feedback: feedback,

internal/graders/run.go:137

The synthetic expectation-derived GraderResults for output_contains_any don’t set Type. Please set Type (likely models.GraderKindText) so downstream output/reporting includes a correct grader kind.

		results["_output_contains_any"] = models.GraderResults{
			Name:     "_output_contains_any",
			Score:    score,
			Passed:   foundAny,
			Feedback: feedback,

Files reviewed: 13/13 changed files
Comments generated: 2

Copilot · 2026-04-21T17:47:25Z

+### output_contains_any
+
+**Type:** array of strings
+
+At least one of these strings must appear in the output (OR logic). Useful when an agent may express a concept in different ways. All checks are case-insensitive.


This section says “All checks are case-insensitive” under output_contains_any, but the implementation applies case-insensitive matching to output_contains and output_not_contains as well. Consider moving this note to a shared place (or repeating it under each field) to avoid implying only output_contains_any is case-insensitive.

Copilot · 2026-04-21T17:47:25Z

+		results := evaluateExpectations(tc, gCtx)
+		r, ok := results["_output_contains_any"]
+		assert.True(t, ok)
+		assert.Equal(t, 1.0, r.Score)
+		assert.True(t, r.Passed)
+		assert.Contains(t, r.Feedback, "beta")
+	})


These expectation-evaluation tests assert Score/Passed/Feedback, but don’t assert that the synthesized results populate GraderResults.Type. Adding an assertion here (and in the other expectation tests) would catch regressions since Type is required in output/reporting.

Add MayInclude (output_contains_any) to TestExpectation, which passes when ANY of the listed strings appear in the agent output. This completes the expectation-level text check trio alongside the existing MustInclude (output_contains) and MustExclude (output_not_contains). Also wires up all three expectation fields in RunAll via the new evaluateExpectations helper — these fields were previously defined but never evaluated. Closes #137 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The integration test step runs `waza run` with the mock executor, which produces generic output that won't match output_contains expectations. This is expected — the test validates that waza completes without crashing, not that mock evals pass. Root cause: PR #203 (v0.27.0) wired up evaluateExpectations() which made output_contains checks actually execute. Before that, these fields were defined but never evaluated, so the integration test passed silently. Exit code 1 (eval failures) is now allowed. Exit codes >1 (crashes, panics) still fail CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 21, 2026 17:19

spboyer requested review from chlowell and richardpark-msft as code owners April 21, 2026 17:19

spboyer added the squad:linus Assigned to Linus (Backend Developer) label Apr 21, 2026

spboyer requested a review from wbreza as a code owner April 21, 2026 17:19

github-actions Bot enabled auto-merge (squash) April 21, 2026 17:19

Copilot started reviewing on behalf of spboyer April 21, 2026 17:20 View session

spboyer force-pushed the squad/137-output-contains-any branch from 9341a64 to 7b861f7 Compare April 21, 2026 17:21

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread internal/graders/run.go

Comment thread internal/graders/run.go Outdated

Comment thread internal/graders/run_test.go

Comment thread internal/graders/run.go Outdated

Copilot AI review requested due to automatic review settings April 21, 2026 17:42

Copilot started reviewing on behalf of spboyer April 21, 2026 17:43 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

spboyer force-pushed the squad/137-output-contains-any branch from a2dcb4a to 545b44a Compare April 21, 2026 18:22

Copilot AI added 2 commits April 21, 2026 14:27

fix: gofmt formatting and misspelling in run.go

16fa7e7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/137-output-contains-any branch from 545b44a to 16fa7e7 Compare April 21, 2026 18:28

spboyer merged commit 7fc7f07 into main Apr 21, 2026
4 of 5 checks passed

spboyer deleted the squad/137-output-contains-any branch April 21, 2026 18:28

This was referenced Apr 21, 2026

Add support for the TestExpectation model field MayInclude (which maps to the yaml output_contains_any field. #137

Closed

fix: CI integration test allows eval failures with mock executor #210

Merged

spboyer mentioned this pull request Feb 28, 2026

🎯 Waza Platform Roadmap - Tracking Issue #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add output_contains_any expectation field#203

feat: add output_contains_any expectation field#203
spboyer merged 2 commits into
mainfrom
squad/137-output-contains-any

spboyer commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Apr 21, 2026

Summary

What changed

Example YAML

Note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants