-
Notifications
You must be signed in to change notification settings - Fork 62
Comparing changes
Open a pull request
base repository: microsoft/waza
base: v0.26.0
head repository: microsoft/waza
compare: v0.27.0
- 5 commits
- 18 files changed
- 2 contributors
Commits on Apr 21, 2026
-
fix: webserver test skips when frontend assets not built (#204)
* fix: webserver test skips when frontend assets not built The TestIndexHTMLReferencesExistingAssets test only checked whether the assets/ directory existed, but not whether it contained actual JS/CSS bundles. On Windows CI (and any env where npm run build hasn't run), the directory could exist empty or with non-bundle files, causing the test to proceed and fail when the SPA fallback served index.html instead of the expected asset content types. Now the test also verifies that assets/ contains at least one .js or .css file before proceeding, and skips with a clear message otherwise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: spelling typo + Windows-compatible absolute path test - Fix 'artefact' → 'artifact' misspelling in webserver test - Use runtime.GOOS to pick platform-correct absolute path in suggest test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 0cfa604 - Browse repository at this point
Copy the full SHA 0cfa604View commit details -
feat: add tool_calls grader (#187) (#202)
* feat: add tool_calls grader (#187) Adds a new tool_calls grader that validates which tools an agent called during execution. Supports four constraint types: - required_tools: tools that must appear in the session - forbidden_tools: tools that must not appear in the session - min_calls: minimum total tool call count - max_calls: maximum total tool call count Partial scoring: score = passed_checks / total_checks. Each constraint counts as one check. Constructor validates parameters at creation time. Includes 25 tests covering constructor validation, required/forbidden tools, call count bounds, combined checks, partial scoring, details output, edge cases, and factory integration. Closes #187 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add tool_calls grader to site docs and schema reference (#187) - Add tool_calls to at-a-glance table in graders.mdx - Add full Tool Calls section with config options, examples, and comparison tip - Add tool_calls to grader type enum in schema.mdx - Site builds clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: errcheck lint violation in tool_calls grader test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 7bd731f - Browse repository at this point
Copy the full SHA 7bd731fView commit details -
feat: allow task prompt to be loaded from file (#157) (#200)
* feat: allow task prompt to be loaded from file (#157) Add prompt_file field to TestStimulus as an alternative to inline prompt. When prompt_file is set, the file content is read and used as the prompt message. The path is resolved relative to the task YAML file's directory. Validation: - Error if both prompt and prompt_file are set - Error if prompt_file doesn't exist Includes 7 test cases covering file load, subdirectory paths, mutual exclusivity, missing file, inline fallback, empty inputs, and multiline. Updates task.schema.json with prompt_file property and oneOf constraint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: reject unix-style absolute paths on Windows in normalizeGeneratedPath filepath.IsAbs does not recognize /etc/evil.yaml as absolute on Windows since it lacks a drive letter. Add explicit check for leading / to catch cross-platform absolute path injection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review feedback on prompt_file (#157) - Reject absolute paths and path traversal (security, per runner pattern) - Clear MessageFile after resolve to avoid leaking paths in serialized output - Add minLength: 1 to prompt_file in JSON schema - Add 3 new tests: absolute path, path traversal, MessageFile clearing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add prompt_file to eval-yaml guide and schema reference (#157) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 0540774 - Browse repository at this point
Copy the full SHA 0540774View commit details -
feat: implement max_response_time_ms behavior rule (#201)
* feat: implement max_response_time_ms behavior rule (#136) Add MaxResponseTimeMs field to BehaviorRules and implement timing compliance check in ComputeBehaviorMetrics. Changes: - Add MaxResponseTimeMs int64 to BehaviorRules (testcase.go) - Add MaxResponseTimeMs, ActualResponseTimeMs, MaxResponseTimeMsPassed to BehaviorMetrics (behavior.go) - Check run.DurationMs <= rules.MaxResponseTimeMs when set - Include MaxResponseTimeMsPassed in AllConstraintsPassed() - Update computeEfficiency from 4×0.25 to 5×0.20 categories - Add max_response_time_ms to JSON schema (task.schema.json) - Add 4 new test cases: under/at/over limit, combined failure - Update existing test expected efficiency scores Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add max_response_time_ms to eval-yaml guide and schema reference Document the new behavior rule field in both the Writing Eval Specs guide and the YAML Schema reference page. Includes field table, usage examples, and description of efficiency scoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt formatting for behavior metrics files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 6fb2f65 - Browse repository at this point
Copy the full SHA 6fb2f65View commit details -
feat: add output_contains_any expectation field (#203)
* feat: add output_contains_any expectation field (#137) Add MayInclude (output_contains_any) to TestExpectation, which passes when ANY of the listed strings appear in the agent output. This completes the expectation-level text check trio alongside the existing MustInclude (output_contains) and MustExclude (output_not_contains). Also wires up all three expectation fields in RunAll via the new evaluateExpectations helper — these fields were previously defined but never evaluated. Closes #137 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt formatting and misspelling in run.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 7fc7f07 - Browse repository at this point
Copy the full SHA 7fc7f07View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.26.0...v0.27.0