Skip to content

feat: support custom agent (.agent.md) file discovery and parsing #225#226

Merged
github-actions[bot] merged 5 commits into
mainfrom
squad/225-custom-agent-eval
Apr 28, 2026
Merged

feat: support custom agent (.agent.md) file discovery and parsing #225#226
github-actions[bot] merged 5 commits into
mainfrom
squad/225-custom-agent-eval

Conversation

@spboyer

@spboyer spboyer commented Apr 28, 2026

Copy link
Copy Markdown
Member

Closes #225

Summary

Adds support for evaluating VS Code custom agents (.agent.md files) alongside existing SKILL.md-based skills. Custom agents share the same Copilot engine and YAML-frontmatter / markdown-body structure but expose agent-specific frontmatter fields (tools, model, handoffs, mcp-servers, agents).

What Changed

P0 — Discovery & loading

  • New internal/skill/agent.goAgentFrontmatter, AgentHandoff, AgentMCPServer, ParseAgentFrontmatter, IsAgentFile, LoadAgentDefinition
  • loadSkillDefinition() (copilot.go) — falls back to .agent.md when no SKILL.md present
  • discoverSkills() (orchestration) — discovers .agent.md for skill injection
  • tryParseSkill() (workspace) — workspace detection picks up .agent.md
  • discoverSkillFiles() (cmd_coverage) — coverage grid includes agent files

P1 — Auto-injected tool_constraint grader

  • New internal/orchestration/agent_graders.goaugmentGradersFromAgent()
  • When an eval targets a .agent.md whose frontmatter declares tools: [...], an implicit tool_constraint grader is added with expect_tools populated from the frontmatter
  • Opt-out: if the user's eval.yaml already declares a tool_constraint grader, the implicit one is skipped

P1 — Example suite

examples/custom-agent/:

  • security-reviewer.agent.md — realistic security-review agent with tools: declared
  • eval.yaml — uses text + prompt graders (tool_constraint auto-injected)
  • tasks/ — 3 tasks: SQL injection, XSS, clean-code (negative case)
  • fixtures/ — vulnerable.py, xss.html, clean.go (build-tagged ignore)
  • trigger_tests.yaml — should/shouldn't trigger prompts
  • README.md — walkthrough

Docs

  • New guide: site/src/content/docs/guides/custom-agents.mdx (Evaluating Custom Agents)
  • eval-yaml.mdx — added "Targeting Custom Agents" section
  • graders.mdx — auto-injection callout on tool_constraint
  • reference/cli.mdx — agent.md notes on waza run and waza coverage
  • Sidebar updated; README updated

Design decisions

  • SKILL.md wins when both files exist in the same directory (no behavior change for existing skills)
  • One agent per directory — first .agent.md match is used
  • Agents reuse SkillInfo — minimal blast radius, no parallel type hierarchy
  • Implicit tool_constraint is opt-out — declaring your own tool_constraint grader disables the implicit one

Testing

  • 21 new tests across internal/skill, internal/orchestration, cmd/waza — all pass
  • Full go test ./... green
  • go vet ./... clean
  • Site builds (18 pages, including new custom-agents guide)

Out of Scope (future work for #225)

  • handoffs and mcp-servers frontmatter fields are parsed but not yet wired into evals (P2)
  • No special handoff testing yet (P2)

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot AI added 2 commits April 28, 2026 14:42
- Add AgentFrontmatter types in internal/skill/agent.go
- Extend loadSkillDefinition() to detect .agent.md files
- Extend discoverSkills() for agent file discovery
- Extend workspace detection for .agent.md
- Extend coverage command to include .agent.md files
- Add comprehensive tests for agent frontmatter parsing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot enabled auto-merge (squash) April 28, 2026 18:45
Copilot AI added 3 commits April 28, 2026 14:55
- New guide: Evaluating Custom Agents with tool constraint validation
- Update eval-yaml guide: add agent targeting and custom agents section
- Update graders guide: add callout for auto-injected tool_constraint
- Update CLI reference: document .agent.md discovery in coverage and run
- Add custom-agents to sidebar navigation
- Update README.md with custom agents support note

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nt example #225

P1 scope for #225:
- Auto-inject tool_constraint grader when eval targets a .agent.md with tools field
- Skip injection if user already defined a tool_constraint grader (opt-out)
- Add LoadAgentDefinition() helper in internal/skill/agent.go
- Add examples/custom-agent/ with security-reviewer agent, tasks, and fixtures
- 9 new tests covering injection, opt-out, no-tools, non-agent, and missing file cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The clean.go fixture imports a SQL driver to demonstrate parameterized queries
for the security-reviewer agent eval, but it isn't part of the module build.
Add //go:build ignore to keep `go test ./...` clean.

Also includes Livingston's history + decision file for the docs work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer

spboyer commented Apr 28, 2026

Copy link
Copy Markdown
Member Author

CI status — all changes-related checks pass ✅

Check Status
Build and Verify Docker Image ✅ pass
Lint ✅ pass
test ✅ pass
ubuntu-latest ✅ pass
windows-latest ✅ pass
license/cla ✅ pass
Run Waza Evaluation ❌ pre-existing failure (see #227)

The "Run Waza Evaluation" failure is not caused by this PR. I reproduced the identical failure on a fresh main clone — examples/code-explainer/eval.yaml returns 0% pass rate on main as well. Filed as #227. Likely fallout from the recent BenchmarkSpec→EvalSpec rename refactor (#222).

Local verification on this branch:

  • go test ./... — all pass
  • go vet ./... — clean
  • cd site && npm run build — 18 pages built successfully (incl. new custom-agents guide)

Ready for review.

@github-actions github-actions Bot merged commit 653a54e into main Apr 28, 2026
6 of 7 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 28, 2026
- Add .agent.md coverage to quick-start.mdx, getting-started.mdx,
  docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226
- Add custom-agent, required-skills-demo, rubrics to examples/README.md
- Update mock engine description in docs/INTEGRATION-TESTING.md and
  eval-yaml.mdx to reflect #228 file content echo behavior
- No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer mentioned this pull request Apr 28, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Support VS Code custom agent (.agent.md) evaluation

2 participants