Skip to content

fix: CI integration test allows eval failures with mock executor#210

Merged
spboyer merged 1 commit into
mainfrom
squad/fix-ci-integration-test
Apr 21, 2026
Merged

fix: CI integration test allows eval failures with mock executor#210
spboyer merged 1 commit into
mainfrom
squad/fix-ci-integration-test

Conversation

@spboyer

@spboyer spboyer commented Apr 21, 2026

Copy link
Copy Markdown
Member

Root cause: PR #203 (v0.27.0) wired up evaluateExpectations() which made output_contains checks actually execute. Before that, these fields were defined but never evaluated — the integration test passed because expectations were silently skipped.

The mock executor outputs Mock response for: {prompt} which doesn't contain keywords like recursive, factorial, async, etc. that the example tasks expect. This is correct behavior — the test validates waza runs without crashing, not that mock evals pass.

Fix: Exit code 1 (eval failures) is now allowed. Exit codes >1 (crashes, panics) still fail CI.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The integration test step runs `waza run` with the mock executor,
which produces generic output that won't match output_contains
expectations. This is expected — the test validates that waza
completes without crashing, not that mock evals pass.

Root cause: PR #203 (v0.27.0) wired up evaluateExpectations() which
made output_contains checks actually execute. Before that, these
fields were defined but never evaluated, so the integration test
passed silently.

Exit code 1 (eval failures) is now allowed. Exit codes >1 (crashes,
panics) still fail CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot enabled auto-merge (squash) April 21, 2026 20:20
@spboyer spboyer merged commit 75b2538 into main Apr 21, 2026
6 checks passed
@spboyer spboyer deleted the squad/fix-ci-integration-test branch April 21, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants