Bug: Diff grader reports fragments as missing when prompt grader confirms they exist
Summary
When running an evaluation task, the diff grader reports expected fragments as missing from a file, even though the prompt grader (LLM-as-judge) confirms the exact same changes are present and correct in the workspace.
Environment
- waza version: latest
- OS: Windows
- Model: claude-opus-4.6-1m
Task Definition
graders:
- type: diff
name: version_enum_check
config:
expected_files:
- path: "Microsoft.Widget/Widget/main.tsp"
contains:
- "+@previewVersion"
- '+v2025-05-04-preview: "2025-05-04-preview"'
- type: prompt
name: setup_check
config:
prompt: |
A new preview version `2025-05-04-preview` has been added to the Versions enum in main.tsp.
The new version is decorated with @previewVersion and has the correct version string.
If all criteria are met, call set_waza_grade_pass.
Otherwise, call set_waza_grade_fail with your reasoning.
model: "gpt-4o-mini"
Grader Results
setup_check (prompt grader) — ✅ PASSED (score: 1.0)
feedback: "All prompts passed"
response: "✅ Pass — 2025-05-04-preview is properly added to the Versions enum
with @Azure.Core.previewVersion and the correct version string."
version_enum_check (diff grader) — ❌ FAILED (score: 0.33)
feedback: "File Microsoft.Widget/Widget/main.tsp missing expected fragment: @previewVersion;
File Microsoft.Widget/Widget/main.tsp missing expected fragment: v2025-05-04-preview: \"2025-05-04-preview\""
Expected Behavior
Since the prompt grader confirms @previewVersion and the version string are present in the file, the diff grader should also detect these fragments.
Possible Causes
I suspect the behavior of diff grader is like: validating the pre-execution workspace files, not post-execution workspace files.
Full Test Result (relevant sections)
{
"version_enum_check": {
"type": "diff",
"score": 0.3333333333333333,
"passed": false,
"feedback": "File Microsoft.Widget/Widget/main.tsp missing expected fragment: @previewVersion; File Microsoft.Widget/Widget/main.tsp missing expected fragment: v2025-05-04-preview: \"2025-05-04-preview\"",
"details": {
"expected_files": [
{
"contains": ["+@previewVersion", "+v2025-05-04-preview: \"2025-05-04-preview\""],
"path": "Microsoft.Widget/Widget/main.tsp"
}
],
"failures": [
"File Microsoft.Widget/Widget/main.tsp missing expected fragment: @previewVersion",
"File Microsoft.Widget/Widget/main.tsp missing expected fragment: v2025-05-04-preview: \"2025-05-04-preview\""
],
"workspace_dir": "C:\\Users\\HAOLIN~1\\AppData\\Local\\Temp\\waza-3490592053"
}
},
"setup_check": {
"type": "prompt",
"score": 1,
"passed": true,
"feedback": "All prompts passed",
"details": {
"response": "✅ Pass — 2025-05-04-preview is properly added to the Versions enum with @Azure.Core.previewVersion and the correct version string."
}
}
}
Full output.json
claude-opus-4.6-1m.json
Bug: Diff grader reports fragments as missing when prompt grader confirms they exist
Summary
When running an evaluation task, the
diffgrader reports expected fragments as missing from a file, even though thepromptgrader (LLM-as-judge) confirms the exact same changes are present and correct in the workspace.Environment
Task Definition
Grader Results
setup_check(prompt grader) — ✅ PASSED (score: 1.0)version_enum_check(diff grader) — ❌ FAILED (score: 0.33)Expected Behavior
Since the prompt grader confirms
@previewVersionand the version string are present in the file, the diff grader should also detect these fragments.Possible Causes
I suspect the behavior of diff grader is like: validating the pre-execution workspace files, not post-execution workspace files.
Full Test Result (relevant sections)
{ "version_enum_check": { "type": "diff", "score": 0.3333333333333333, "passed": false, "feedback": "File Microsoft.Widget/Widget/main.tsp missing expected fragment: @previewVersion; File Microsoft.Widget/Widget/main.tsp missing expected fragment: v2025-05-04-preview: \"2025-05-04-preview\"", "details": { "expected_files": [ { "contains": ["+@previewVersion", "+v2025-05-04-preview: \"2025-05-04-preview\""], "path": "Microsoft.Widget/Widget/main.tsp" } ], "failures": [ "File Microsoft.Widget/Widget/main.tsp missing expected fragment: @previewVersion", "File Microsoft.Widget/Widget/main.tsp missing expected fragment: v2025-05-04-preview: \"2025-05-04-preview\"" ], "workspace_dir": "C:\\Users\\HAOLIN~1\\AppData\\Local\\Temp\\waza-3490592053" } }, "setup_check": { "type": "prompt", "score": 1, "passed": true, "feedback": "All prompts passed", "details": { "response": "✅ Pass — 2025-05-04-preview is properly added to the Versions enum with @Azure.Core.previewVersion and the correct version string." } } }Full output.json
claude-opus-4.6-1m.json