fix: prompt grader gracefully recovers when follow-up turn fails after grades collected by sebastienlevert · Pull Request #251 · microsoft/waza

sebastienlevert · 2026-05-20T19:02:20Z

Summary

Fixes #250 — the prompt grader now gracefully recovers when the Copilot SDK's follow-up turn fails after grade tool results are sent back to the model.

Problem

SendAndWait returns an error when the model fails to respond to the follow-up turn after set_waza_grade_pass/set_waza_grade_fail tool results are sent back. The error was propagated unconditionally, discarding the already-collected grades. This caused evaluations to report score=0.00 and status=error even though the judge successfully graded every criterion.

Fix

After SendAndWait returns an error, check if grade tool calls were already collected in wazaTools.Passes/wazaTools.Failures. If they were, log a warning and continue with the actual scores instead of failing. Also handles the nil resp case since SendAndWait returns (nil, error).

Changes

internal/graders/prompt_grader.go:
- Check len(wazaTools.Passes) + len(wazaTools.Failures) > 0 before propagating error
- Guard resp.Data.Content access for nil resp
- Log warning with pass/fail counts for observability

Testing

Verified on Windows 11 with waza v0.31.0 (built from source with this patch):

Before patch: score=0.00, status=error — grades discarded

[ERROR] running graders: failed to run grader rubric_judge: failed to send prompt: 
session error: Failed to get response from the AI model; retried 5 times

After patch: score=0.60 — grades recovered (3 pass, 2 fail)

WARN "prompt grader: ignoring post-grade session error (grades already collected)" passes=3 failures=2
[GRADER] rubric_judge score=0.60 (39.835s)

Note

The pairwise grader (runPairwiseOnce) has the same SendAndWait + error pattern and would benefit from the same fix. I kept this PR focused on the gradeIndependent path where I could reproduce and verify the fix.

…r grades collected The Copilot SDK unconditionally sends tool results back to the model after set_waza_grade_pass/fail tool calls, starting a follow-up assistant turn. When that turn fails ('Failed to get response from the AI model'), SendAndWait returns an error — but the grades were already collected in wazaTools.Passes/Failures. Before this fix, the error was propagated and all grade data was discarded (score=0.00, status=error). Now, if grades were already collected, the post-grade session error is logged as a warning and the actual scores are returned. Also handles the nil resp case when recovering from the error, since SendAndWait returns (nil, error). Fixes microsoft#250

codecov-commenter · 2026-05-20T19:05:53Z

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@63c4908). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/graders/prompt_grader.go	0.00%	8 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #251   +/-   ##
=======================================
  Coverage        ?   75.69%           
=======================================
  Files           ?      152           
  Lines           ?    17627           
  Branches        ?        0           
=======================================
  Hits            ?    13342           
  Misses          ?     3356           
  Partials        ?      929

Flag	Coverage Δ
go-implementation	`75.69% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sebastienlevert requested a review from spboyer as a code owner May 20, 2026 19:02

github-actions Bot enabled auto-merge (squash) May 20, 2026 19:02

github-actions Bot merged commit e136ce3 into microsoft:main May 20, 2026
8 checks passed

This was referenced May 20, 2026

Release v0.32.0 #252

Merged

bug: prompt grader discards collected grades when follow-up turn fails after tool results #250

Closed

Release v0.33.0 #264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prompt grader gracefully recovers when follow-up turn fails after grades collected#251

fix: prompt grader gracefully recovers when follow-up turn fails after grades collected#251
github-actions[bot] merged 1 commit into
microsoft:mainfrom
sebastienlevert:fix/prompt-grader-graceful-recovery

sebastienlevert commented May 20, 2026

Uh oh!

codecov-commenter commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sebastienlevert commented May 20, 2026

Summary

Problem

Fix

Changes

Testing

Note

Uh oh!

codecov-commenter commented May 20, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants