Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: microsoft/waza
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.30.1
Choose a base ref
...
head repository: microsoft/waza
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.31.0
Choose a head ref
  • 7 commits
  • 84 files changed
  • 3 contributors

Commits on Apr 22, 2026

  1. refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestR…

    …unner→EvalRunner (#166) (#222)
    
    * refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166)
    
    Rename Go identifiers to align with eval/task vocabulary:
    - BenchmarkSpec → EvalSpec (models/spec.go)
    - LoadBenchmarkSpec → LoadEvalSpec (models/spec.go)
    - BenchmarkConfig → EvalConfig (config/config.go)
    - NewBenchmarkConfig → NewEvalConfig (config/config.go)
    - TestRunner → EvalRunner (orchestration/runner.go)
    - TestStimulus → TaskStimulus (models/testcase.go)
    - TestExpectation → TaskExpectation (models/testcase.go)
    
    All YAML and JSON struct tags are unchanged for backward compatibility.
    Type aliases and wrapper functions provided for all renamed exports.
    TestCase intentionally NOT renamed (Go convention, too deeply embedded).
    Updates README.md and AGENTS.md to reflect new names.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: update Linus history with #166 vocabulary renames
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: gofmt formatting in config.go
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 22, 2026
    Configuration menu
    Copy the full SHA
    5c2ee8f View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2026

  1. feat: support custom agent (.agent.md) file discovery and parsing #225 (

    #226)
    
    * feat: support custom agent (.agent.md) file discovery and parsing #225
    
    - Add AgentFrontmatter types in internal/skill/agent.go
    - Extend loadSkillDefinition() to detect .agent.md files
    - Extend discoverSkills() for agent file discovery
    - Extend workspace detection for .agent.md
    - Extend coverage command to include .agent.md files
    - Add comprehensive tests for agent frontmatter parsing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: update squad history and decision for #225
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: document custom agent (.agent.md) eval support #225
    
    - New guide: Evaluating Custom Agents with tool constraint validation
    - Update eval-yaml guide: add agent targeting and custom agents section
    - Update graders guide: add callout for auto-injected tool_constraint
    - Update CLI reference: document .agent.md discovery in coverage and run
    - Add custom-agents to sidebar navigation
    - Update README.md with custom agents support note
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: auto-inject tool_constraint from agent frontmatter + custom-agent example #225
    
    P1 scope for #225:
    - Auto-inject tool_constraint grader when eval targets a .agent.md with tools field
    - Skip injection if user already defined a tool_constraint grader (opt-out)
    - Add LoadAgentDefinition() helper in internal/skill/agent.go
    - Add examples/custom-agent/ with security-reviewer agent, tasks, and fixtures
    - 9 new tests covering injection, opt-out, no-tools, non-agent, and missing file cases
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: exclude custom-agent fixture from Go build via build tag
    
    The clean.go fixture imports a SQL driver to demonstrate parameterized queries
    for the security-reviewer agent eval, but it isn't part of the module build.
    Add //go:build ignore to keep `go test ./...` clean.
    
    Also includes Livingston's history + decision file for the docs work.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    653a54e View commit details
    Browse the repository at this point in the history
  2. fix: mock engine echoes file content for CI evals (#227) (#228)

    * fix: mock engine echoes file content so output_contains expectations work in CI
    
    The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock
    engine. The mock previously returned only "Mock response for: <prompt>" + a
    file count, so realistic _output_contains expectations against file contents
    (e.g., "async", "fetch" for fetch_user.js) failed every time.
    
    Now the mock includes task metadata (name, description), context values, file
    paths, and a 1KB content preview per resource. This lets evals validate the
    full pipeline (discovery → execution → grading) in CI without requiring a
    real model.
    
    Closes #227
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: update linus history and decision for mock engine change
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * ci: trigger waza-eval workflow on engine/orchestration/grader changes
    
    The mock engine, runner, and graders all directly affect eval execution.
    Without this, fixes like #227 wouldn't run the eval workflow on the PR
    that introduced them.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    dfec036 View commit details
    Browse the repository at this point in the history
  3. fix: waza serve crashes when stdin is not a terminal (#224)

    The MCP stdio server starts unconditionally alongside the HTTP
    dashboard server. When waza serve runs in the background or with
    piped/closed stdin, the MCP reader hits EOF and crashes the entire
    process — killing the HTTP server.
    
    Fix: only start MCP stdio when stdin is a terminal (term.IsTerminal).
    The HTTP dashboard works independently without MCP.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    c79eea1 View commit details
    Browse the repository at this point in the history
  4. chore(deps): Bump postcss from 8.5.6 to 8.5.12 in /site (#229)

    Bumps [postcss](https://github.com/postcss/postcss) from 8.5.6 to 8.5.12.
    - [Release notes](https://github.com/postcss/postcss/releases)
    - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
    - [Commits](postcss/postcss@8.5.6...8.5.12)
    
    ---
    updated-dependencies:
    - dependency-name: postcss
      dependency-version: 8.5.12
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    c51c10b View commit details
    Browse the repository at this point in the history
  5. docs: audit and update for #222/#226/#228 cross-references (#230)

    - Add .agent.md coverage to quick-start.mdx, getting-started.mdx,
      docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226
    - Add custom-agent, required-skills-demo, rubrics to examples/README.md
    - Update mock engine description in docs/INTEGRATION-TESTING.md and
      eval-yaml.mdx to reflect #228 file content echo behavior
    - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    6956f85 View commit details
    Browse the repository at this point in the history
  6. Release v0.31.0 (#231)

    * chore: prepare release v0.31.0 + backfill CHANGELOG
    
    - Bump version to 0.31.0 in version.txt and extension.yaml
    - Backfill CHANGELOG.md for v0.25.0 through v0.30.1 (gap since [0.24.0])
    - Add v0.31.0 release notes
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: update Livingston history with release v0.31.0 work
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 28, 2026
    Configuration menu
    Copy the full SHA
    bf77c75 View commit details
    Browse the repository at this point in the history
Loading