Release v0.31.0 by spboyer · Pull Request #231 · microsoft/waza

spboyer · 2026-04-28T19:56:04Z

Release v0.31.0

Added

Custom agent (.agent.md) eval support — Discover .agent.md files alongside SKILL.md, parse agent-specific frontmatter (tools, model, handoffs, mcp-servers, agents), auto-inject tool_constraint grader from agent tools: field, complete worked example under examples/custom-agent/, and new "Evaluating Custom Agents" docs guide (feat: support custom agent (.agent.md) file discovery and parsing #225 #226, closes feat: Support VS Code custom agent (.agent.md) evaluation #225)

Fixed

Mock engine echoes file content — _output_contains expectations against file contents now work in CI without a real model. Mock response includes task metadata, file paths, and a 1KB content preview per resource (fix: mock engine echoes file content for CI evals (#227) #228, closes bug: Waza Evaluation CI fails on main — code-explainer mock eval returns 0% pass rate #227)
waza serve no longer crashes when stdin is not a terminal — MCP stdio server only starts when term.IsTerminal() is true; piped input or background mode no longer kills the HTTP dashboard (fix: waza serve crashes when stdin is not a terminal #224)

Changed

Vocabulary renames — Internal types renamed: BenchmarkSpec → EvalSpec, TestRunner → EvalRunner. Not a breaking change for external consumers (types live in internal/) (refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166) #222)

Documentation

Cross-reference audit for recent renames + custom agent feature (docs: cross-reference audit for recent renames and feature additions #230)

Dependencies

Bump postcss from 8.5.6 to 8.5.12 in /site (chore(deps): Bump postcss from 8.5.6 to 8.5.12 in /site #229)

Backfilled changelog entries

This release also backfills changelog entries for v0.25.0 through v0.30.1 — versions that shipped between 2026-04-21 and 2026-04-22 without changelog updates. The backfill is curated from git history per Keep-a-Changelog conventions.

Release process

After merge, push the tag to trigger the unified release workflow (release.yml):

git tag v0.31.0
git push origin v0.31.0

The workflow will: build CLI binaries (6 platforms), build + pack the azd extension, create GitHub releases, publish to azd registry, and sync registry.json.

Checklist

Version bumped in version.txt and extension.yaml
CHANGELOG.md updated (incl. backfilled gap)
go test ./... passes
go vet ./... clean
Site builds (npm run build in /site)

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

- Bump version to 0.31.0 in version.txt and extension.yaml - Backfill CHANGELOG.md for v0.25.0 through v0.30.1 (gap since [0.24.0]) - Add v0.31.0 release notes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chore: prepare release v0.31.0 + backfill CHANGELOG

3689422

- Bump version to 0.31.0 in version.txt and extension.yaml - Backfill CHANGELOG.md for v0.25.0 through v0.30.1 (gap since [0.24.0]) - Add v0.31.0 release notes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot enabled auto-merge (squash) April 28, 2026 19:56

docs: update Livingston history with release v0.31.0 work

c444b13

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot merged commit bf77c75 into main Apr 28, 2026
5 checks passed

github-actions Bot deleted the release/v0.31.0 branch April 28, 2026 20:09

spboyer mentioned this pull request May 20, 2026

feat: Support VS Code custom agent (.agent.md) evaluation #225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.31.0#231

Release v0.31.0#231
github-actions[bot] merged 2 commits into
mainfrom
release/v0.31.0

spboyer commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

spboyer commented Apr 28, 2026

Release v0.31.0

Added

Fixed

Changed

Documentation

Dependencies

Backfilled changelog entries

Release process

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants