Summary
examples/custom-agent/eval.yaml appears invalid against the current eval schema/model. It is missing required top-level fields and uses executor: copilot, but the current schema/source only accept copilot-sdk and mock.
I could not find a .github/ISSUE_TEMPLATE directory in this repository, so I am using this plain bug report format.
Evidence
The example currently contains:
- no top-level
skill: or agent:
- no
metrics: section
executor: copilot
Source:
|
name: security-reviewer-eval |
|
description: Evaluates the security-reviewer custom agent |
|
|
|
config: |
|
executor: copilot |
|
model: claude-sonnet-4 |
|
trials_per_task: 1 |
|
timeout_seconds: 120 |
|
parallel: false |
|
|
|
tasks: |
|
- tasks/*.yaml |
|
|
|
graders: |
|
- type: text |
|
name: identifies_severity |
|
config: |
|
regex_match: |
|
- "(?i)severity" |
|
|
|
- type: prompt |
|
name: review_quality |
|
rubric: | |
|
Score 1-5 how well the response identified security issues: |
|
5 — found all vulnerabilities with accurate severity and clear remediation |
|
3 — found most issues but missed some details |
|
1 — missed major vulnerabilities or wrong severity |
The schema requires name, skill, config, metrics, and tasks:
|
"required": [ |
|
"name", |
|
"skill", |
|
"config", |
|
"metrics", |
|
"tasks" |
|
], |
|
"additionalProperties": false, |
The schema requires config.executor:
|
"config": { |
|
"type": "object", |
|
"required": [ |
|
"trials_per_task", |
|
"timeout_seconds", |
|
"executor", |
|
"model" |
|
], |
The allowed executor enum is only copilot-sdk or mock:
|
"type": "string", |
|
"enum": [ |
|
"copilot-sdk", |
|
"mock" |
|
], |
|
"description": "Execution engine to use. 'copilot-sdk' for real evaluations, 'mock' for testing." |
The runtime switch also only handles mock and copilot-sdk:
|
case "mock": |
|
engine = execution.NewMockEngine(spec.Config.ModelID) |
|
case "copilot-sdk": |
|
engine = execution.NewCopilotEngineBuilder(spec.Config.ModelID, &execution.CopilotEngineBuilderOptions{ |
|
NewCopilotClient: newCopilotClientFn, // if nil, uses the real function, otherwise overridable for tests. |
|
}).Build() |
|
default: |
|
return nil, fmt.Errorf("unknown engine type: %s", spec.Config.EngineType) |
|
} |
|
if keepWorkspace { |
Expected
The checked-in custom-agent example should be copy/paste runnable with the current CLI and schema.
Actual
The example appears to be stale or ahead of the current parser/schema. It likely fails validation/loading before evaluating the custom agent.
Suggested fix
Update examples/custom-agent/eval.yaml to match the supported schema. For example, if custom agents are still addressed through skill discovery:
name: security-reviewer-eval
description: Evaluates the security-reviewer custom agent
skill: security-reviewer
version: "1.0"
config:
executor: copilot-sdk
model: claude-sonnet-4.6
trials_per_task: 1
timeout_seconds: 120
parallel: false
metrics:
- name: review_quality
weight: 1.0
threshold: 0.8
tasks:
- tasks/*.yaml
If agent: is intended to be the supported target field instead, this example should probably be fixed together with parser/schema support for agent:.
Summary
examples/custom-agent/eval.yamlappears invalid against the current eval schema/model. It is missing required top-level fields and usesexecutor: copilot, but the current schema/source only acceptcopilot-sdkandmock.I could not find a
.github/ISSUE_TEMPLATEdirectory in this repository, so I am using this plain bug report format.Evidence
The example currently contains:
skill:oragent:metrics:sectionexecutor: copilotSource:
waza/examples/custom-agent/eval.yaml
Lines 1 to 27 in 0f5f245
The schema requires
name,skill,config,metrics, andtasks:waza/schemas/eval.schema.json
Lines 7 to 14 in 0f5f245
The schema requires
config.executor:waza/schemas/eval.schema.json
Lines 93 to 100 in 0f5f245
The allowed executor enum is only
copilot-sdkormock:waza/schemas/eval.schema.json
Lines 130 to 135 in 0f5f245
The runtime switch also only handles
mockandcopilot-sdk:waza/cmd/waza/cmd_run.go
Lines 638 to 647 in 0f5f245
Expected
The checked-in custom-agent example should be copy/paste runnable with the current CLI and schema.
Actual
The example appears to be stale or ahead of the current parser/schema. It likely fails validation/loading before evaluating the custom agent.
Suggested fix
Update
examples/custom-agent/eval.yamlto match the supported schema. For example, if custom agents are still addressed through skill discovery:If
agent:is intended to be the supported target field instead, this example should probably be fixed together with parser/schema support foragent:.