-
Notifications
You must be signed in to change notification settings - Fork 62
Comparing changes
Open a pull request
base repository: microsoft/waza
base: v0.25.0
head repository: microsoft/waza
compare: v0.26.0
- 18 commits
- 47 files changed
- 9 contributors
Commits on Apr 21, 2026
-
fix: macOS install + trigger test off-by-1 count (#164, #184) (#193)
* fix: install.sh uses shasum on macOS when sha256sum unavailable (#164) The install script was failing on macOS because it prioritized sha256sum over shasum. While sha256sum exists on some macOS systems (via Homebrew), the BSD version doesn't support the -c flag needed for checksum verification. This fix: - Prioritizes shasum (native on macOS, supports -c flag) - Falls back to sha256sum only if it supports the -c flag - Exits with an error if no compatible utility is found (rather than skipping verification with a warning) Fixes #164 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: trigger test result count off by 1 (#184) ComputeTriggerMetrics used weighted sums (confidence-adjusted) for the integer TP/FP/TN/FN counts. Medium-confidence prompts contributed 0.5 instead of 1.0, so groups with 6 high + 2 medium prompts reported 7 instead of 8. Fix: track actual result counts for the integer fields while keeping weighted values for precision/recall/F1/accuracy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 9ff4bb3 - Browse repository at this point
Copy the full SHA 9ff4bb3View commit details -
docs: update demo guide and add CI/CD integration guide (#112, #89) (#…
…194) - Fix DEMO-SCRIPT.md to match current CLI commands - Remove references to 'waza generate' command (doesn't exist) - Replace with 'waza new skill' and 'waza new eval' - Remove outdated flags: --log, --suggestions, --trials, --fail-threshold - Replace with current flags: --session-log, --session-dir, --task, --parallel - Update Part 5+ sections to reflect current CLI behavior - Add comprehensive CI/CD integration guide (docs/CI-CD-GUIDE.md) - GitHub Actions examples (basic, multi-model, baseline comparison) - Azure DevOps pipeline examples - Secrets management for both platforms - Best practices: caching, quality gates, parallel execution, logging - Troubleshooting guide with common issues - Advanced workflows: approval gates, trend tracking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for cd914c6 - Browse repository at this point
Copy the full SHA cd914c6View commit details -
fix: validate grader config required fields (#195)
* fix: validate grader config required fields (#113) Grader configurations now validate type-specific required fields at parse time: - code graders require at least one assertion in config.assertions - diff graders require at least one file in config.expected_files - json_schema graders require config.schema or config.schema_file - program graders require config.command - trigger graders require config.skill_path - action_sequence graders require config.expected_actions - skill_invocation graders require config.required_skills - tool_constraint graders require config.expect_tools or config.reject_tools - file graders require at least one of must_exist, must_not_exist, or content_patterns The strict YAML parser (KnownFields) already catches fields at the wrong nesting level. This change adds semantic validation to catch graders with empty/missing required fields. Validation is enforced in both GraderConfig (spec-level) and ValidatorInline (task-level) graders via their UnmarshalYAML methods. Fixes #113 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: run gofmt on spec.go and testcase.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: update test fixtures for file/diff grader required config validation Test fixtures for coverage report tests used file and diff graders without required config fields, causing parse failures after the grader config validation added in #113. Updated fixtures to include valid config (must_exist and expected_files respectively). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Shayne Boyer <spboyer@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 900c10f - Browse repository at this point
Copy the full SHA 900c10fView commit details -
fix: diff grader reads post-execution workspace files (#165) (#196)
The diff grader was reading workspace files from the filesystem after the Copilot SDK session had disconnected. Since session.Disconnect() may restore workspace files to their pre-execution state, the grader would see the original file content instead of the agent's modifications. Fix: capture all workspace file contents into memory before the session disconnects, and have the diff grader prefer these captured files over the on-disk workspace. This guarantees graders always see the true post-execution state regardless of SDK disconnect behavior. Changes: - Add WorkspaceFiles field to ExecutionResponse and graders.Context - Add captureWorkspaceFiles() that snapshots workspace before disconnect - Add readWorkspaceFile() to diff grader that prefers captured files over filesystem reads, with forward-slash normalization for cross-platform consistency - Add tests for the capture function and grader behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for f1a0fe6 - Browse repository at this point
Copy the full SHA f1a0fe6View commit details -
chore(deps): Bump smol-toml from 1.6.0 to 1.6.1 in /site (#158)
Bumps [smol-toml](https://github.com/squirrelchat/smol-toml) from 1.6.0 to 1.6.1. - [Release notes](https://github.com/squirrelchat/smol-toml/releases) - [Commits](squirrelchat/smol-toml@v1.6.0...v1.6.1) --- updated-dependencies: - dependency-name: smol-toml dependency-version: 1.6.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 2d03487 - Browse repository at this point
Copy the full SHA 2d03487View commit details -
chore(deps): Bump picomatch in /site (#159)
Bumps and [picomatch](https://github.com/micromatch/picomatch). These dependencies needed to be updated together. Updates `picomatch` from 4.0.3 to 4.0.4 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) Updates `picomatch` from 2.3.1 to 2.3.2 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) --- updated-dependencies: - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect - dependency-name: picomatch dependency-version: 2.3.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 08b024f - Browse repository at this point
Copy the full SHA 08b024fView commit details -
chore(deps): Bump picomatch from 4.0.3 to 4.0.4 in /web (#160)
Bumps [picomatch](https://github.com/micromatch/picomatch) from 4.0.3 to 4.0.4. - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@4.0.3...4.0.4) --- updated-dependencies: - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for d2cc5b0 - Browse repository at this point
Copy the full SHA d2cc5b0View commit details -
chore(deps): Bump astro from 5.17.3 to 5.18.1 in /site (#163)
Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) from 5.17.3 to 5.18.1. - [Release notes](https://github.com/withastro/astro/releases) - [Changelog](https://github.com/withastro/astro/blob/astro@5.18.1/packages/astro/CHANGELOG.md) - [Commits](https://github.com/withastro/astro/commits/astro@5.18.1/packages/astro) --- updated-dependencies: - dependency-name: astro dependency-version: 5.18.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 2dc6f07 - Browse repository at this point
Copy the full SHA 2dc6f07View commit details -
chore(deps): Bump vite from 6.4.1 to 6.4.2 in /site (#182)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.4.1 to 6.4.2. - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite) --- updated-dependencies: - dependency-name: vite dependency-version: 6.4.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 78902cd - Browse repository at this point
Copy the full SHA 78902cdView commit details -
chore(deps): Bump go.opentelemetry.io/otel/sdk from 1.42.0 to 1.43.0 (#…
…185) Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.42.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.42.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 13f7624 - Browse repository at this point
Copy the full SHA 13f7624View commit details -
chore(deps-dev): Bump vite from 6.4.1 to 6.4.2 in /web (#192)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.4.1 to 6.4.2. - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite) --- updated-dependencies: - dependency-name: vite dependency-version: 6.4.2 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for abad478 - Browse repository at this point
Copy the full SHA abad478View commit details -
chore(deps): Bump defu from 6.1.4 to 6.1.6 in /site (#181)
Bumps [defu](https://github.com/unjs/defu) from 6.1.4 to 6.1.6. - [Release notes](https://github.com/unjs/defu/releases) - [Changelog](https://github.com/unjs/defu/blob/main/CHANGELOG.md) - [Commits](unjs/defu@v6.1.4...v6.1.6) --- updated-dependencies: - dependency-name: defu dependency-version: 6.1.6 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 2ce74fe - Browse repository at this point
Copy the full SHA 2ce74feView commit details -
Make it so the debug logging is more useful (#152)
* Make it so the debug logging is more useful. - Omits some entries, like report_intent, that just add noise to the debugging process. Had to have some state to do this since only the tool.execution_start indicates that it's a report_intent event. - Dive into the arguments, input and context parameters, which usually contain important information. - Output the selected model, and the producer (ie: the agent) * Whoops, they're debug now * Addressing all the copilot feedback, and restructuring the code that does the event testing a bit to make it more readable. --------- Co-authored-by: Richard Park <ripark@microsoft.com>
Configuration menu - View commit details
-
Copy full SHA for 9309566 - Browse repository at this point
Copy the full SHA 9309566View commit details -
run --output-dirgroups files by timestamp (#153)* `run --output-dir` groups files by timestamp * tweak the docs
Configuration menu - View commit details
-
Copy full SHA for 5890653 - Browse repository at this point
Copy the full SHA 5890653View commit details -
fix: --discover finds eval.yaml in project-root evals/{name}/ layout (#…
…44) * Initial plan * fix: --discover finds eval.yaml in project-layout evals/{name}/ directory Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com> * fix: avoid exact path comparison in TestDiscoverProjectLayout (Windows symlink short paths) Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for c99a318 - Browse repository at this point
Copy the full SHA c99a318View commit details -
fix: update jsonrpc test fixture for grader validation (#113)
The grader config validation from PR #195 correctly rejects code graders without assertions. Updated the test eval YAML to include a config.assertions field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for b2fe957 - Browse repository at this point
Copy the full SHA b2fe957View commit details -
docs: add cache command, prompt mode, and complete schema reference (#…
…198) - Add waza cache clear command to CLI reference with flags and examples - Add mode field to Prompt grader in graders guide (independent/pairwise) - Add missing config fields to schema reference: max_attempts, group_by, fail_fast, skill_directories, required_skills, mcp_servers - All 16 documentation pages build successfully Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 015392f - Browse repository at this point
Copy the full SHA 015392fView commit details -
test: add coverage for suggest and jsonrpc packages (#199)
* test: add coverage for suggest and jsonrpc packages Boost suggest from 68% to 83% and jsonrpc from 69% to 90%. suggest package: - grader_docs_test.go: GraderSummaries format, LoadGraderDocs with fstest.MapFS (nil, empty, mixed valid/invalid, whitespace trimming) - prompt_test.go: renderSelectionPrompt, renderImplementationPrompt, renderPrompt with various data shapes and empty fields - helpers_test.go: orDefault, phrasesToText, summarizeBody, extractYAML, normalizeGeneratedPath, filterValidGraderTypes, parseGraderSelection edge cases - resolve_test.go: resolveSkillFile, loadSkill, buildPromptData, WriteToDir edge cases (empty paths, traversal, invalid YAML) jsonrpc package: - methods_test.go: MethodRegistry CRUD, overwrite, empty name, handler error/params, RegisterHandlers verification - handlers_extra_test.go: task.list/task.get success paths, run.cancel success/already-completed, eval.* edge cases, TCP listener start/close/serve, malformed YAML validation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: add coverage for storage and mcp packages - storage: 39.1% → 54.6% (new tests for azure_blob pure functions, store helpers, local.go edge cases) - mcp: 58.3% → 89.4% (new tests for task_list, quickLinkCheck, resolveDir, ServeStdio, hasIDField, dispatchTool, skill check) Replaced all skipped mock tests in azure_blob_test.go with real tests for sanitizePathSegment, stringPtr, getMetadata, isCI, blobToResultSummary, outcomeToResultSummary, and NewAzureBlobStore validation. Note: storage cannot reach 70% without extracting a blob client interface from AzureBlobStore. The remaining 0% functions (Upload, Download, List, Compare, findBlobBySuffix, findBlobByMetadata) all call *azblob.Client directly. A follow-up PR can introduce the interface for full testability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address errcheck lint violations in storage tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback — remove sleep, use bufio scanner, fix paths - Replace time.Sleep with immediate connection (listener already active) - Use bufio.Scanner for newline-delimited JSON instead of raw conn.Read - Replace hard-coded absolute path with t.TempDir()-based path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: cross-platform path tests and errcheck lint issues - Use filepath.FromSlash for path assertions in normalizeGeneratedPath tests - Use runtime.GOOS to pick platform-appropriate absolute path in rejection test - Satisfy errcheck linter by explicitly discarding os.Setenv/Unsetenv/WriteFile errors in tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remaining errcheck violations in mcp coverage tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 8a6129a - Browse repository at this point
Copy the full SHA 8a6129aView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.25.0...v0.26.0