examples: custom-agent eval.yaml is invalid against current schema

## Summary

`examples/custom-agent/eval.yaml` appears invalid against the current eval schema/model. It is missing required top-level fields and uses `executor: copilot`, but the current schema/source only accept `copilot-sdk` and `mock`.

I could not find a `.github/ISSUE_TEMPLATE` directory in this repository, so I am using this plain bug report format.

## Evidence

The example currently contains:

- no top-level `skill:` or `agent:`
- no `metrics:` section
- `executor: copilot`

Source: https://github.com/microsoft/waza/blob/0f5f24508a075dd3e11f0fde4162f447bf16540d/examples/custom-agent/eval.yaml#L1-L27

The schema requires `name`, `skill`, `config`, `metrics`, and `tasks`:

https://github.com/microsoft/waza/blob/0f5f24508a075dd3e11f0fde4162f447bf16540d/schemas/eval.schema.json#L7-L14

The schema requires `config.executor`:

https://github.com/microsoft/waza/blob/0f5f24508a075dd3e11f0fde4162f447bf16540d/schemas/eval.schema.json#L93-L100

The allowed executor enum is only `copilot-sdk` or `mock`:

https://github.com/microsoft/waza/blob/0f5f24508a075dd3e11f0fde4162f447bf16540d/schemas/eval.schema.json#L130-L135

The runtime switch also only handles `mock` and `copilot-sdk`:

https://github.com/microsoft/waza/blob/0f5f24508a075dd3e11f0fde4162f447bf16540d/cmd/waza/cmd_run.go#L638-L647

## Expected

The checked-in custom-agent example should be copy/paste runnable with the current CLI and schema.

## Actual

The example appears to be stale or ahead of the current parser/schema. It likely fails validation/loading before evaluating the custom agent.

## Suggested fix

Update `examples/custom-agent/eval.yaml` to match the supported schema. For example, if custom agents are still addressed through skill discovery:

```yaml
name: security-reviewer-eval
description: Evaluates the security-reviewer custom agent
skill: security-reviewer
version: "1.0"

config:
  executor: copilot-sdk
  model: claude-sonnet-4.6
  trials_per_task: 1
  timeout_seconds: 120
  parallel: false

metrics:
  - name: review_quality
    weight: 1.0
    threshold: 0.8

tasks:
  - tasks/*.yaml
```

If `agent:` is intended to be the supported target field instead, this example should probably be fixed together with parser/schema support for `agent:`.


	"required": [
	"name",
	"skill",
	"config",
	"metrics",
	"tasks"
	],
	"additionalProperties": false,

	"config": {
	"type": "object",
	"required": [
	"trials_per_task",
	"timeout_seconds",
	"executor",
	"model"
	],

	"type": "string",
	"enum": [
	"copilot-sdk",
	"mock"
	],
	"description": "Execution engine to use. 'copilot-sdk' for real evaluations, 'mock' for testing."

	case "mock":
	engine = execution.NewMockEngine(spec.Config.ModelID)
	case "copilot-sdk":
	engine = execution.NewCopilotEngineBuilder(spec.Config.ModelID, &execution.CopilotEngineBuilderOptions{
	NewCopilotClient: newCopilotClientFn, // if nil, uses the real function, otherwise overridable for tests.
	}).Build()
	default:
	return nil, fmt.Errorf("unknown engine type: %s", spec.Config.EngineType)
	}
	if keepWorkspace {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples: custom-agent eval.yaml is invalid against current schema #274

Summary

Evidence

Expected

Actual

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	name: security-reviewer-eval
	description: Evaluates the security-reviewer custom agent

	config:
	executor: copilot
	model: claude-sonnet-4
	trials_per_task: 1
	timeout_seconds: 120
	parallel: false

	tasks:
	- tasks/*.yaml

	graders:
	- type: text
	name: identifies_severity
	config:
	regex_match:
	- "(?i)severity"

	- type: prompt
	name: review_quality
	rubric: \|
	Score 1-5 how well the response identified security issues:
	5 — found all vulnerabilities with accurate severity and clear remediation
	3 — found most issues but missed some details
	1 — missed major vulnerabilities or wrong severity

Uh oh!

examples: custom-agent eval.yaml is invalid against current schema #274

Description

Summary

Evidence

Expected

Actual

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions