fix: prompt grader gracefully recovers when follow-up turn fails after grades collected#251
Merged
github-actions[bot] merged 1 commit intoMay 20, 2026
Conversation
…r grades collected
The Copilot SDK unconditionally sends tool results back to the model
after set_waza_grade_pass/fail tool calls, starting a follow-up
assistant turn. When that turn fails ('Failed to get response from
the AI model'), SendAndWait returns an error — but the grades were
already collected in wazaTools.Passes/Failures.
Before this fix, the error was propagated and all grade data was
discarded (score=0.00, status=error). Now, if grades were already
collected, the post-grade session error is logged as a warning and
the actual scores are returned.
Also handles the nil resp case when recovering from the error, since
SendAndWait returns (nil, error).
Fixes microsoft#250
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #251 +/- ##
=======================================
Coverage ? 75.69%
=======================================
Files ? 152
Lines ? 17627
Branches ? 0
=======================================
Hits ? 13342
Misses ? 3356
Partials ? 929
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced May 20, 2026
Merged
Closed
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #250 — the prompt grader now gracefully recovers when the Copilot SDK's follow-up turn fails after grade tool results are sent back to the model.
Problem
SendAndWaitreturns an error when the model fails to respond to the follow-up turn afterset_waza_grade_pass/set_waza_grade_failtool results are sent back. The error was propagated unconditionally, discarding the already-collected grades. This caused evaluations to reportscore=0.00andstatus=erroreven though the judge successfully graded every criterion.Fix
After
SendAndWaitreturns an error, check if grade tool calls were already collected inwazaTools.Passes/wazaTools.Failures. If they were, log a warning and continue with the actual scores instead of failing. Also handles thenilresp case sinceSendAndWaitreturns(nil, error).Changes
internal/graders/prompt_grader.go:len(wazaTools.Passes) + len(wazaTools.Failures) > 0before propagating errorresp.Data.Contentaccess for nil respTesting
Verified on Windows 11 with waza v0.31.0 (built from source with this patch):
Before patch:
score=0.00, status=error— grades discardedAfter patch:
score=0.60— grades recovered (3 pass, 2 fail)Note
The
pairwisegrader (runPairwiseOnce) has the sameSendAndWait+ error pattern and would benefit from the same fix. I kept this PR focused on thegradeIndependentpath where I could reproduce and verify the fix.