Summary
In a multi-skill workspace, waza coverage . and waza run evals/.../eval.yaml both recognize the eval structure, but waza check does not discover the related eval suite consistently.
This creates a confusing user experience because the workspace appears valid for coverage and execution, but readiness checks report that no eval exists or fail to find the skill.
Environment
- Waza version:
0.31.0
- OS: Windows
- Shell: PowerShell
Workspace shape
This was reproduced in a repository with this structure:
repo/
├── .waza.yaml
├── skills/
│ ├── development/skill-creator/SKILL.md
│ ├── copilot-exclusive/mcp-ecosystem/SKILL.md
│ ├── copilot-exclusive/fleet-parallel/SKILL.md
│ ├── testing/eval-harness/SKILL.md
│ └── copilot-exclusive/background-agent/SKILL.md
└── evals/
├── skill-creator/eval.yaml
├── mcp-ecosystem/eval.yaml
├── fleet-parallel/eval.yaml
├── eval-harness/eval.yaml
└── background-agent/eval.yaml
The root .waza.yaml includes:
paths:
skills: skills/
evals: evals/
results: results/
server:
resultsDir: results
Steps to reproduce
From the workspace root:
waza coverage .
waza check skills/development/skill-creator
waza check skill-creator
waza run evals/skill-creator/eval.yaml --task create-dev-skill-001 --trials 1 --output-dir results -v
Actual behavior
waza coverage .
Coverage recognizes the pilot evals and shows the skill as partially covered.
Example outcome:
skill-creator appears with 2 tasks
- other pilot skills also appear in the coverage grid
waza run evals/skill-creator/eval.yaml ...
Run executes successfully enough to:
- load the skill directories
- run the selected task
- apply graders
- save a result JSON file under
results/...
waza check skills/development/skill-creator
Output includes:
Evaluation Suite: Not Found
waza check skill-creator
Output includes:
no SKILL.md found in <workspace root>
Expected behavior
waza check should discover the corresponding eval suite consistently when:
- the workspace has a valid root
.waza.yaml
- the skill exists under
skills/.../<skill-name>/SKILL.md
- the eval exists under
evals/<skill-name>/eval.yaml
coverage and run already succeed against the same workspace
At minimum, check should either:
- locate the matching eval and report it, or
- explain the discovery rule clearly enough that the user can resolve the mismatch
Why this matters
For new users, check looks like the natural readiness command. When it disagrees with coverage and run, it is hard to tell whether:
- the workspace is invalid
- the eval naming/path convention is wrong
- or
check is using different discovery rules
This makes initial adoption harder, especially in multi-skill repositories.
Additional note
This may be either:
- a discovery bug in
check, or
- a documentation/UX gap where
check expects a different invocation pattern than users would infer from coverage and run
Either way, the current behavior is surprising.
Possible improvements
Any of these would help:
- Make
check reuse the same discovery logic as coverage
- Document the exact resolution rule from skill path/name to eval path
- Add debug output showing where
check looked for the eval
- Improve the error message when the skill name/path is interpreted differently than expected
Summary
In a multi-skill workspace,
waza coverage .andwaza run evals/.../eval.yamlboth recognize the eval structure, butwaza checkdoes not discover the related eval suite consistently.This creates a confusing user experience because the workspace appears valid for coverage and execution, but readiness checks report that no eval exists or fail to find the skill.
Environment
0.31.0Workspace shape
This was reproduced in a repository with this structure:
The root
.waza.yamlincludes:Steps to reproduce
From the workspace root:
Actual behavior
waza coverage .Coverage recognizes the pilot evals and shows the skill as partially covered.
Example outcome:
skill-creatorappears with2taskswaza run evals/skill-creator/eval.yaml ...Run executes successfully enough to:
results/...waza check skills/development/skill-creatorOutput includes:
waza check skill-creatorOutput includes:
Expected behavior
waza checkshould discover the corresponding eval suite consistently when:.waza.yamlskills/.../<skill-name>/SKILL.mdevals/<skill-name>/eval.yamlcoverageandrunalready succeed against the same workspaceAt minimum,
checkshould either:Why this matters
For new users,
checklooks like the natural readiness command. When it disagrees withcoverageandrun, it is hard to tell whether:checkis using different discovery rulesThis makes initial adoption harder, especially in multi-skill repositories.
Additional note
This may be either:
check, orcheckexpects a different invocation pattern than users would infer fromcoverageandrunEither way, the current behavior is surprising.
Possible improvements
Any of these would help:
checkreuse the same discovery logic ascoveragechecklooked for the eval