Skip to content

feat: Support VS Code custom agent (.agent.md) evaluation #225

Description

@spboyer

Summary

Add support for evaluating VS Code custom agents (.agent.md files) alongside existing SKILL.md-based skills. Custom agents use the same Copilot engine but have different file format and frontmatter schema.

Motivation

Custom agents (VS Code docs) are .agent.md files with YAML frontmatter defining tools, model preferences, handoffs, and MCP servers. They are the primary way teams build specialized AI personas (security reviewers, planners, implementers). Waza should evaluate them with the same rigor as skills.

Changes Required

P0 — File Discovery

  • Extend discoverSkills() in internal/orchestration/skill_discovery.go to detect .agent.md files alongside SKILL.md
  • Extend buildSkillSystemMessage() in internal/execution/copilot.go to load .agent.md content
  • Extend loadSkillDefinition() to parse agent frontmatter (name, description, tools, model)

P0 — Agent Frontmatter Parsing

  • Parse agent-specific fields: tools, model, handoffs, mcp-servers, agents
  • Map to waza concepts: tools → tool constraints, model → eval model, handoffs → follow-up workflows

P1 — Auto-Generated Tool Constraints

  • When an .agent.md specifies tools: ['search/codebase', 'web/fetch'], automatically validate the agent only used those tools during eval
  • Generate implicit tool_constraint grader from agent frontmatter

P1 — Example Eval Suite

  • Add examples/custom-agent/ with a sample .agent.md + eval.yaml + tasks + trigger tests
  • Demonstrate tool restriction validation, output quality grading, handoff testing

Documentation

  • Update site CLI reference, eval-yaml guide
  • Add a "Evaluating Custom Agents" guide to the docs site
  • Update README

Agent ↔ Waza Mapping

Agent Property Waza Equivalent
tools tool_calls / tool_constraint grader
model config.model
description Trigger testing
Body instructions Skill content injection
handoffs follow_up_prompts + action_sequence
mcp-servers config.mcp_servers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions