bug: prompt grader discards collected grades when follow-up turn fails after tool results

## Summary

The prompt grader discards already-collected grades when the Copilot SDK's follow-up turn fails after tool results are sent back to the model. This causes evaluations to report `score=0.00` and `status=error` even though the judge model successfully graded every criterion.

## Reproduction

1. Create an eval with a `prompt` grader using `continue_session: true`
2. Run it with `waza run eval.yaml --verbose --debug`
3. Observe in debug output:
   - Judge model responds with `set_waza_grade_pass` / `set_waza_grade_fail` tool calls (all collected successfully)
   - SDK sends tool results back to the model
   - Model starts a new `assistant.turn_start` (follow-up turn)
   - Follow-up turn fails: `Failed to get response from the AI model; retried 5 times`
   - `SendAndWait` returns error
   - `gradeIndependent` propagates the error, discarding the grades

## Debug Event Timeline

```
T14:45:33  tool.execution_complete   <- All 5 grade tools completed successfully
T14:45:33  assistant.turn_end        <- Judge turn ended
T14:45:33  assistant.turn_start      <- SDK starts ANOTHER turn (sending tool results back)
T14:45:37  session.info              <- 5 retries, ~4s apart
T14:45:41  session.info
T14:45:44  session.info
T14:45:48  session.info
T14:45:52  session.info
T14:45:56  session.error             <- "Failed to get response from the AI model"
```

## Root Cause

In `prompt_grader.go:gradeIndependent()`, the error check after `SendAndWait` does not inspect whether grade tool calls were already collected:

```go
resp, err := session.SendAndWait(ctx, copilot.MessageOptions{...})
if err != nil {
    return nil, fmt.Errorf("failed to send prompt: %w", err)  // grades discarded
}
```

The Copilot SDK (`copilot-sdk/go`) unconditionally sends tool results back to the model via `HandlePendingToolCall`. The LLM API protocol requires the model to respond after receiving tool results, so the SDK starts another assistant turn. When that follow-up turn fails (rate limiting, context window, transient error), `SendAndWait` returns an error — even though the grade data is already in `wazaTools.Passes` and `wazaTools.Failures`.

## Impact

- **Every prompt grader invocation** is affected when the follow-up turn fails
- Scores that should be 0.60, 0.80, 1.00 are reported as 0.00
- The `pairwise` grader has the same pattern (`runPairwiseOnce`)
- Intermittent — depends on whether the follow-up model call succeeds

## Environment

- waza v0.31.0
- copilot-sdk/go v0.1.32
- Windows 11, Copilot CLI 1.0.51-2
- Models tested: claude-sonnet-4.5, claude-opus-4.6 (both exhibit the issue)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: prompt grader discards collected grades when follow-up turn fails after tool results #250

Summary

Reproduction

Debug Event Timeline

Root Cause

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

bug: prompt grader discards collected grades when follow-up turn fails after tool results #250

Description

Summary

Reproduction

Debug Event Timeline

Root Cause

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions