Comparing changes

…unner→EvalRunner (#166) (#222) * refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166) Rename Go identifiers to align with eval/task vocabulary: - BenchmarkSpec → EvalSpec (models/spec.go) - LoadBenchmarkSpec → LoadEvalSpec (models/spec.go) - BenchmarkConfig → EvalConfig (config/config.go) - NewBenchmarkConfig → NewEvalConfig (config/config.go) - TestRunner → EvalRunner (orchestration/runner.go) - TestStimulus → TaskStimulus (models/testcase.go) - TestExpectation → TaskExpectation (models/testcase.go) All YAML and JSON struct tags are unchanged for backward compatibility. Type aliases and wrapper functions provided for all renamed exports. TestCase intentionally NOT renamed (Go convention, too deeply embedded). Updates README.md and AGENTS.md to reflect new names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update Linus history with #166 vocabulary renames Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt formatting in config.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

#226) * feat: support custom agent (.agent.md) file discovery and parsing #225 - Add AgentFrontmatter types in internal/skill/agent.go - Extend loadSkillDefinition() to detect .agent.md files - Extend discoverSkills() for agent file discovery - Extend workspace detection for .agent.md - Extend coverage command to include .agent.md files - Add comprehensive tests for agent frontmatter parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update squad history and decision for #225 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document custom agent (.agent.md) eval support #225 - New guide: Evaluating Custom Agents with tool constraint validation - Update eval-yaml guide: add agent targeting and custom agents section - Update graders guide: add callout for auto-injected tool_constraint - Update CLI reference: document .agent.md discovery in coverage and run - Add custom-agents to sidebar navigation - Update README.md with custom agents support note Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: auto-inject tool_constraint from agent frontmatter + custom-agent example #225 P1 scope for #225: - Auto-inject tool_constraint grader when eval targets a .agent.md with tools field - Skip injection if user already defined a tool_constraint grader (opt-out) - Add LoadAgentDefinition() helper in internal/skill/agent.go - Add examples/custom-agent/ with security-reviewer agent, tasks, and fixtures - 9 new tests covering injection, opt-out, no-tools, non-agent, and missing file cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: exclude custom-agent fixture from Go build via build tag The clean.go fixture imports a SQL driver to demonstrate parameterized queries for the security-reviewer agent eval, but it isn't part of the module build. Add //go:build ignore to keep `go test ./...` clean. Also includes Livingston's history + decision file for the docs work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: mock engine echoes file content so output_contains expectations work in CI The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only "Mock response for: <prompt>" + a file count, so realistic _output_contains expectations against file contents (e.g., "async", "fetch" for fetch_user.js) failed every time. Now the mock includes task metadata (name, description), context values, file paths, and a 1KB content preview per resource. This lets evals validate the full pipeline (discovery → execution → grading) in CI without requiring a real model. Closes #227 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update linus history and decision for mock engine change Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger waza-eval workflow on engine/orchestration/grader changes The mock engine, runner, and graders all directly affect eval execution. Without this, fixes like #227 wouldn't run the eval workflow on the PR that introduced them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The MCP stdio server starts unconditionally alongside the HTTP dashboard server. When waza serve runs in the background or with piped/closed stdin, the MCP reader hits EOF and crashes the entire process — killing the HTTP server. Fix: only start MCP stdio when stdin is a terminal (term.IsTerminal). The HTTP dashboard works independently without MCP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Bumps [postcss](https://github.com/postcss/postcss) from 8.5.6 to 8.5.12. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.6...8.5.12) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

- Add .agent.md coverage to quick-start.mdx, getting-started.mdx, docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226 - Add custom-agent, required-skills-demo, rubrics to examples/README.md - Update mock engine description in docs/INTEGRATION-TESTING.md and eval-yaml.mdx to reflect #228 file content echo behavior - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: prepare release v0.31.0 + backfill CHANGELOG - Bump version to 0.31.0 in version.txt and extension.yaml - Backfill CHANGELOG.md for v0.25.0 through v0.30.1 (gap since [0.24.0]) - Add v0.31.0 release notes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update Livingston history with release v0.31.0 work Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Apr 22, 2026

Commits on Apr 28, 2026

This comparison is taking too long to generate.

Uh oh!