Skip to content

waza check eval discovery is inconsistent with coverage and run in multi-skill workspaces #238

Description

@drvoss

Summary

In a multi-skill workspace, waza coverage . and waza run evals/.../eval.yaml both recognize the eval structure, but waza check does not discover the related eval suite consistently.

This creates a confusing user experience because the workspace appears valid for coverage and execution, but readiness checks report that no eval exists or fail to find the skill.

Environment

  • Waza version: 0.31.0
  • OS: Windows
  • Shell: PowerShell

Workspace shape

This was reproduced in a repository with this structure:

repo/
├── .waza.yaml
├── skills/
│   ├── development/skill-creator/SKILL.md
│   ├── copilot-exclusive/mcp-ecosystem/SKILL.md
│   ├── copilot-exclusive/fleet-parallel/SKILL.md
│   ├── testing/eval-harness/SKILL.md
│   └── copilot-exclusive/background-agent/SKILL.md
└── evals/
    ├── skill-creator/eval.yaml
    ├── mcp-ecosystem/eval.yaml
    ├── fleet-parallel/eval.yaml
    ├── eval-harness/eval.yaml
    └── background-agent/eval.yaml

The root .waza.yaml includes:

paths:
  skills: skills/
  evals: evals/
  results: results/

server:
  resultsDir: results

Steps to reproduce

From the workspace root:

waza coverage .
waza check skills/development/skill-creator
waza check skill-creator
waza run evals/skill-creator/eval.yaml --task create-dev-skill-001 --trials 1 --output-dir results -v

Actual behavior

waza coverage .

Coverage recognizes the pilot evals and shows the skill as partially covered.

Example outcome:

  • skill-creator appears with 2 tasks
  • other pilot skills also appear in the coverage grid

waza run evals/skill-creator/eval.yaml ...

Run executes successfully enough to:

  • load the skill directories
  • run the selected task
  • apply graders
  • save a result JSON file under results/...

waza check skills/development/skill-creator

Output includes:

Evaluation Suite: Not Found

waza check skill-creator

Output includes:

no SKILL.md found in <workspace root>

Expected behavior

waza check should discover the corresponding eval suite consistently when:

  1. the workspace has a valid root .waza.yaml
  2. the skill exists under skills/.../<skill-name>/SKILL.md
  3. the eval exists under evals/<skill-name>/eval.yaml
  4. coverage and run already succeed against the same workspace

At minimum, check should either:

  • locate the matching eval and report it, or
  • explain the discovery rule clearly enough that the user can resolve the mismatch

Why this matters

For new users, check looks like the natural readiness command. When it disagrees with coverage and run, it is hard to tell whether:

  • the workspace is invalid
  • the eval naming/path convention is wrong
  • or check is using different discovery rules

This makes initial adoption harder, especially in multi-skill repositories.

Additional note

This may be either:

  • a discovery bug in check, or
  • a documentation/UX gap where check expects a different invocation pattern than users would infer from coverage and run

Either way, the current behavior is surprising.

Possible improvements

Any of these would help:

  1. Make check reuse the same discovery logic as coverage
  2. Document the exact resolution rule from skill path/name to eval path
  3. Add debug output showing where check looked for the eval
  4. Improve the error message when the skill name/path is interpreted differently than expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions