-
Notifications
You must be signed in to change notification settings - Fork 62
Comparing changes
Open a pull request
base repository: microsoft/waza
base: v0.30.1
head repository: microsoft/waza
compare: v0.31.0
- 7 commits
- 84 files changed
- 3 contributors
Commits on Apr 22, 2026
-
refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestR…
…unner→EvalRunner (#166) (#222) * refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166) Rename Go identifiers to align with eval/task vocabulary: - BenchmarkSpec → EvalSpec (models/spec.go) - LoadBenchmarkSpec → LoadEvalSpec (models/spec.go) - BenchmarkConfig → EvalConfig (config/config.go) - NewBenchmarkConfig → NewEvalConfig (config/config.go) - TestRunner → EvalRunner (orchestration/runner.go) - TestStimulus → TaskStimulus (models/testcase.go) - TestExpectation → TaskExpectation (models/testcase.go) All YAML and JSON struct tags are unchanged for backward compatibility. Type aliases and wrapper functions provided for all renamed exports. TestCase intentionally NOT renamed (Go convention, too deeply embedded). Updates README.md and AGENTS.md to reflect new names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update Linus history with #166 vocabulary renames Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt formatting in config.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 5c2ee8f - Browse repository at this point
Copy the full SHA 5c2ee8fView commit details
Commits on Apr 28, 2026
-
feat: support custom agent (.agent.md) file discovery and parsing #225 (
#226) * feat: support custom agent (.agent.md) file discovery and parsing #225 - Add AgentFrontmatter types in internal/skill/agent.go - Extend loadSkillDefinition() to detect .agent.md files - Extend discoverSkills() for agent file discovery - Extend workspace detection for .agent.md - Extend coverage command to include .agent.md files - Add comprehensive tests for agent frontmatter parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update squad history and decision for #225 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document custom agent (.agent.md) eval support #225 - New guide: Evaluating Custom Agents with tool constraint validation - Update eval-yaml guide: add agent targeting and custom agents section - Update graders guide: add callout for auto-injected tool_constraint - Update CLI reference: document .agent.md discovery in coverage and run - Add custom-agents to sidebar navigation - Update README.md with custom agents support note Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: auto-inject tool_constraint from agent frontmatter + custom-agent example #225 P1 scope for #225: - Auto-inject tool_constraint grader when eval targets a .agent.md with tools field - Skip injection if user already defined a tool_constraint grader (opt-out) - Add LoadAgentDefinition() helper in internal/skill/agent.go - Add examples/custom-agent/ with security-reviewer agent, tasks, and fixtures - 9 new tests covering injection, opt-out, no-tools, non-agent, and missing file cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: exclude custom-agent fixture from Go build via build tag The clean.go fixture imports a SQL driver to demonstrate parameterized queries for the security-reviewer agent eval, but it isn't part of the module build. Add //go:build ignore to keep `go test ./...` clean. Also includes Livingston's history + decision file for the docs work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 653a54e - Browse repository at this point
Copy the full SHA 653a54eView commit details -
fix: mock engine echoes file content for CI evals (#227) (#228)
* fix: mock engine echoes file content so output_contains expectations work in CI The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only "Mock response for: <prompt>" + a file count, so realistic _output_contains expectations against file contents (e.g., "async", "fetch" for fetch_user.js) failed every time. Now the mock includes task metadata (name, description), context values, file paths, and a 1KB content preview per resource. This lets evals validate the full pipeline (discovery → execution → grading) in CI without requiring a real model. Closes #227 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update linus history and decision for mock engine change Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger waza-eval workflow on engine/orchestration/grader changes The mock engine, runner, and graders all directly affect eval execution. Without this, fixes like #227 wouldn't run the eval workflow on the PR that introduced them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for dfec036 - Browse repository at this point
Copy the full SHA dfec036View commit details -
fix: waza serve crashes when stdin is not a terminal (#224)
The MCP stdio server starts unconditionally alongside the HTTP dashboard server. When waza serve runs in the background or with piped/closed stdin, the MCP reader hits EOF and crashes the entire process — killing the HTTP server. Fix: only start MCP stdio when stdin is a terminal (term.IsTerminal). The HTTP dashboard works independently without MCP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for c79eea1 - Browse repository at this point
Copy the full SHA c79eea1View commit details -
chore(deps): Bump postcss from 8.5.6 to 8.5.12 in /site (#229)
Bumps [postcss](https://github.com/postcss/postcss) from 8.5.6 to 8.5.12. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.6...8.5.12) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for c51c10b - Browse repository at this point
Copy the full SHA c51c10bView commit details -
docs: audit and update for #222/#226/#228 cross-references (#230)
- Add .agent.md coverage to quick-start.mdx, getting-started.mdx, docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226 - Add custom-agent, required-skills-demo, rubrics to examples/README.md - Update mock engine description in docs/INTEGRATION-TESTING.md and eval-yaml.mdx to reflect #228 file content echo behavior - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 6956f85 - Browse repository at this point
Copy the full SHA 6956f85View commit details -
* chore: prepare release v0.31.0 + backfill CHANGELOG - Bump version to 0.31.0 in version.txt and extension.yaml - Backfill CHANGELOG.md for v0.25.0 through v0.30.1 (gap since [0.24.0]) - Add v0.31.0 release notes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update Livingston history with release v0.31.0 work Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for bf77c75 - Browse repository at this point
Copy the full SHA bf77c75View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.30.1...v0.31.0