feat: support custom agent (.agent.md) file discovery and parsing #225 by spboyer · Pull Request #226 · microsoft/waza

spboyer · 2026-04-28T18:45:05Z

Closes #225

Summary

Adds support for evaluating VS Code custom agents (.agent.md files) alongside existing SKILL.md-based skills. Custom agents share the same Copilot engine and YAML-frontmatter / markdown-body structure but expose agent-specific frontmatter fields (tools, model, handoffs, mcp-servers, agents).

What Changed

P0 — Discovery & loading

New internal/skill/agent.go — AgentFrontmatter, AgentHandoff, AgentMCPServer, ParseAgentFrontmatter, IsAgentFile, LoadAgentDefinition
loadSkillDefinition() (copilot.go) — falls back to .agent.md when no SKILL.md present
discoverSkills() (orchestration) — discovers .agent.md for skill injection
tryParseSkill() (workspace) — workspace detection picks up .agent.md
discoverSkillFiles() (cmd_coverage) — coverage grid includes agent files

P1 — Auto-injected tool_constraint grader

New internal/orchestration/agent_graders.go — augmentGradersFromAgent()
When an eval targets a .agent.md whose frontmatter declares tools: [...], an implicit tool_constraint grader is added with expect_tools populated from the frontmatter
Opt-out: if the user's eval.yaml already declares a tool_constraint grader, the implicit one is skipped

P1 — Example suite

examples/custom-agent/:

security-reviewer.agent.md — realistic security-review agent with tools: declared
eval.yaml — uses text + prompt graders (tool_constraint auto-injected)
tasks/ — 3 tasks: SQL injection, XSS, clean-code (negative case)
fixtures/ — vulnerable.py, xss.html, clean.go (build-tagged ignore)
trigger_tests.yaml — should/shouldn't trigger prompts
README.md — walkthrough

Docs

New guide: site/src/content/docs/guides/custom-agents.mdx (Evaluating Custom Agents)
eval-yaml.mdx — added "Targeting Custom Agents" section
graders.mdx — auto-injection callout on tool_constraint
reference/cli.mdx — agent.md notes on waza run and waza coverage
Sidebar updated; README updated

Design decisions

SKILL.md wins when both files exist in the same directory (no behavior change for existing skills)
One agent per directory — first .agent.md match is used
Agents reuse SkillInfo — minimal blast radius, no parallel type hierarchy
Implicit tool_constraint is opt-out — declaring your own tool_constraint grader disables the implicit one

Testing

21 new tests across internal/skill, internal/orchestration, cmd/waza — all pass
Full go test ./... green
go vet ./... clean
Site builds (18 pages, including new custom-agents guide)

Out of Scope (future work for #225)

handoffs and mcp-servers frontmatter fields are parsed but not yet wired into evals (P2)
No special handoff testing yet (P2)

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

- Add AgentFrontmatter types in internal/skill/agent.go - Extend loadSkillDefinition() to detect .agent.md files - Extend discoverSkills() for agent file discovery - Extend workspace detection for .agent.md - Extend coverage command to include .agent.md files - Add comprehensive tests for agent frontmatter parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- New guide: Evaluating Custom Agents with tool constraint validation - Update eval-yaml guide: add agent targeting and custom agents section - Update graders guide: add callout for auto-injected tool_constraint - Update CLI reference: document .agent.md discovery in coverage and run - Add custom-agents to sidebar navigation - Update README.md with custom agents support note Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nt example #225 P1 scope for #225: - Auto-inject tool_constraint grader when eval targets a .agent.md with tools field - Skip injection if user already defined a tool_constraint grader (opt-out) - Add LoadAgentDefinition() helper in internal/skill/agent.go - Add examples/custom-agent/ with security-reviewer agent, tasks, and fixtures - 9 new tests covering injection, opt-out, no-tools, non-agent, and missing file cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The clean.go fixture imports a SQL driver to demonstrate parameterized queries for the security-reviewer agent eval, but it isn't part of the module build. Add //go:build ignore to keep `go test ./...` clean. Also includes Livingston's history + decision file for the docs work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-04-28T19:05:30Z

CI status — all changes-related checks pass ✅

Check	Status
Build and Verify Docker Image	✅ pass
Lint	✅ pass
test	✅ pass
ubuntu-latest	✅ pass
windows-latest	✅ pass
license/cla	✅ pass
Run Waza Evaluation	❌ pre-existing failure (see #227)

The "Run Waza Evaluation" failure is not caused by this PR. I reproduced the identical failure on a fresh main clone — examples/code-explainer/eval.yaml returns 0% pass rate on main as well. Filed as #227. Likely fallout from the recent BenchmarkSpec→EvalSpec rename refactor (#222).

Local verification on this branch:

go test ./... — all pass
go vet ./... — clean
cd site && npm run build — 18 pages built successfully (incl. new custom-agents guide)

Ready for review.

- Add .agent.md coverage to quick-start.mdx, getting-started.mdx, docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226 - Add custom-agent, required-skills-demo, rubrics to examples/README.md - Update mock engine description in docs/INTEGRATION-TESTING.md and eval-yaml.mdx to reflect #228 file content echo behavior - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI added 2 commits April 28, 2026 14:42

docs: update squad history and decision for #225

91e52b4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot enabled auto-merge (squash) April 28, 2026 18:45

Copilot AI added 3 commits April 28, 2026 14:55

spboyer mentioned this pull request Apr 28, 2026

bug: Waza Evaluation CI fails on main — code-explainer mock eval returns 0% pass rate #227

Closed

github-actions Bot merged commit 653a54e into main Apr 28, 2026
6 of 7 checks passed

spboyer mentioned this pull request Apr 28, 2026

docs: cross-reference audit for recent renames and feature additions #230

Merged

spboyer mentioned this pull request Apr 28, 2026

Release v0.31.0 #231

Merged

5 tasks

spboyer mentioned this pull request May 20, 2026

feat: Support VS Code custom agent (.agent.md) evaluation #225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support custom agent (.agent.md) file discovery and parsing #225#226

feat: support custom agent (.agent.md) file discovery and parsing #225#226
github-actions[bot] merged 5 commits into
mainfrom
squad/225-custom-agent-eval

spboyer commented Apr 28, 2026 •

edited

Loading

Uh oh!

spboyer commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

spboyer commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

P0 — Discovery & loading

P1 — Auto-injected tool_constraint grader

P1 — Example suite

Docs

Design decisions

Testing

Out of Scope (future work for #225)

Uh oh!

spboyer commented Apr 28, 2026

CI status — all changes-related checks pass ✅

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spboyer commented Apr 28, 2026 •

edited

Loading