Skip to content

examples: custom-agent eval.yaml is invalid against current schema #274

Description

@coygeek

Summary

examples/custom-agent/eval.yaml appears invalid against the current eval schema/model. It is missing required top-level fields and uses executor: copilot, but the current schema/source only accept copilot-sdk and mock.

I could not find a .github/ISSUE_TEMPLATE directory in this repository, so I am using this plain bug report format.

Evidence

The example currently contains:

  • no top-level skill: or agent:
  • no metrics: section
  • executor: copilot

Source:

name: security-reviewer-eval
description: Evaluates the security-reviewer custom agent
config:
executor: copilot
model: claude-sonnet-4
trials_per_task: 1
timeout_seconds: 120
parallel: false
tasks:
- tasks/*.yaml
graders:
- type: text
name: identifies_severity
config:
regex_match:
- "(?i)severity"
- type: prompt
name: review_quality
rubric: |
Score 1-5 how well the response identified security issues:
5 — found all vulnerabilities with accurate severity and clear remediation
3 — found most issues but missed some details
1 — missed major vulnerabilities or wrong severity

The schema requires name, skill, config, metrics, and tasks:

"required": [
"name",
"skill",
"config",
"metrics",
"tasks"
],
"additionalProperties": false,

The schema requires config.executor:

"config": {
"type": "object",
"required": [
"trials_per_task",
"timeout_seconds",
"executor",
"model"
],

The allowed executor enum is only copilot-sdk or mock:

"type": "string",
"enum": [
"copilot-sdk",
"mock"
],
"description": "Execution engine to use. 'copilot-sdk' for real evaluations, 'mock' for testing."

The runtime switch also only handles mock and copilot-sdk:

waza/cmd/waza/cmd_run.go

Lines 638 to 647 in 0f5f245

case "mock":
engine = execution.NewMockEngine(spec.Config.ModelID)
case "copilot-sdk":
engine = execution.NewCopilotEngineBuilder(spec.Config.ModelID, &execution.CopilotEngineBuilderOptions{
NewCopilotClient: newCopilotClientFn, // if nil, uses the real function, otherwise overridable for tests.
}).Build()
default:
return nil, fmt.Errorf("unknown engine type: %s", spec.Config.EngineType)
}
if keepWorkspace {

Expected

The checked-in custom-agent example should be copy/paste runnable with the current CLI and schema.

Actual

The example appears to be stale or ahead of the current parser/schema. It likely fails validation/loading before evaluating the custom agent.

Suggested fix

Update examples/custom-agent/eval.yaml to match the supported schema. For example, if custom agents are still addressed through skill discovery:

name: security-reviewer-eval
description: Evaluates the security-reviewer custom agent
skill: security-reviewer
version: "1.0"

config:
  executor: copilot-sdk
  model: claude-sonnet-4.6
  trials_per_task: 1
  timeout_seconds: 120
  parallel: false

metrics:
  - name: review_quality
    weight: 1.0
    threshold: 0.8

tasks:
  - tasks/*.yaml

If agent: is intended to be the supported target field instead, this example should probably be fixed together with parser/schema support for agent:.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions