Fixing skill loading to use .waza.yaml#131
Conversation
- Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path - Changed projectconfig.Load() to set the new ProjectConfig.Dir folder to the absolute path where we loaded the project configuration from. This lets us use the proper absolute path when we're loading or looking for skills in a different folder than where the .waza.yaml is located. - Fix workspace.DetectContext to allow for an absolute path to a skills dir or evals dir, in addition to it's normal heuristics. - Added in a bunch of tests, with a fixed but common layout, to show how things are resolved, depending on cwd. The rules are: - If you're in the evals folder, you're scoping it to choose evals. If you're in a _specific_ evals folder only that eval runs. - If you're at the root (ie, with .waza.yaml), then all evals run. - If you're in skills, corresponding evals will run. Small stuff: - renamed skillSpecPath.specPath to evalSpecPath - renamed [BenchmarkSpec (ie: eval.yaml)].Config.RunsPerTest to TrialsPerTask, to match the field name in the YAML.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #131 +/- ##
=======================================
Coverage ? 73.88%
=======================================
Files ? 138
Lines ? 15828
Branches ? 0
=======================================
Hits ? 11694
Misses ? 3287
Partials ? 847
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates waza run workspace/config resolution so skills/evals can be located relative to the directory containing .waza.yaml (not just the current working directory), and aligns the eval config “trials” field naming across code and tests.
Changes:
- Track the on-disk config location via
ProjectConfig.Dirand use it to resolve configured skills/evals directories. - Extend workspace detection/eval lookup to support absolute skills/evals directories.
- Rename
Config.RunsPerTest→TrialsPerTaskand update runner/cache/tests accordingly; add CLI tests covering skill/eval resolution by CWD.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/projectconfig/config.go | Add ProjectConfig.Dir; adjust config discovery to return config path + bytes. |
| internal/projectconfig/config_test.go | Assert cfg.Dir is set when walking up to find .waza.yaml. |
| cmd/waza/workspace_resolve.go | Resolve WithSkillsDir / WithEvalsDir relative to cfg.Dir to support config-rooted layouts. |
| internal/workspace/workspace.go | Allow absolute skills dir in DetectContext; allow absolute evals dir in FindEval. |
| cmd/waza/cmd_run.go | Discover workspace skills from configured skills dir and inject as default skill directories; allow overriding Copilot client in tests. |
| cmd/waza/cmd_run_test.go | Add end-to-end-ish tests validating eval selection + skill loading across different CWDs. |
| cmd/waza/cmd_run_workspace_test.go | Update to renamed evalSpecPath field. |
| internal/models/spec.go | Rename RunsPerTest field to TrialsPerTask (YAML key unchanged). |
| internal/models/spec_test.go | Update assertions for renamed trials field. |
| internal/orchestration/runner.go | Use TrialsPerTask when computing digests and running trials. |
| internal/orchestration/*_test.go | Update tests to the renamed trials field. |
| internal/cache/cache.go | Use TrialsPerTask in cache key. |
| internal/cache/cache_test.go | Update tests to the renamed trials field. |
| cmd/waza/testdata/run/eval-skill-search/** | Add test fixture workspace with .waza.yaml pointing to .github/skills and .github/evals. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Updates workspace/config path resolution so waza run can reliably discover and load skills/evals based on .waza.yaml (including .github/{skills,evals} layouts), and aligns the eval config field name with the YAML key.
Changes:
- Add
ProjectConfig.Dir(absolute) to anchor config-relative path resolution and use it for workspace detection overrides. - Extend workspace detection / eval lookup to support absolute skills/evals directories (e.g., from
.waza.yaml) and wire discovered skill directories into Copilot session creation by default. - Rename
RunsPerTest→TrialsPerTaskin the eval config model and update call sites/tests accordingly; add workspace-resolution integration tests + testdata.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/workspace/workspace.go | Allow absolute skills/evals paths in workspace detection and eval resolution. |
| internal/projectconfig/config.go | Track absolute config directory via ProjectConfig.Dir; return config path from finder. |
| internal/projectconfig/config_test.go | Assert ProjectConfig.Dir is set when loading config from parent dirs. |
| cmd/waza/workspace_resolve.go | Resolve skills/evals dirs relative to ProjectConfig.Dir before workspace detection. |
| cmd/waza/cmd_run.go | Discover skills from .waza.yaml skills path and inject as default skill dirs for Copilot runs; rename spec-path field. |
| cmd/waza/cmd_run_test.go | Add end-to-end run tests for .github/{skills,evals} workspace layouts using a mocked Copilot client/session. |
| cmd/waza/cmd_run_workspace_test.go | Update spec-path field rename in workspace-path resolution tests. |
| internal/models/spec.go | Rename config field to TrialsPerTask (YAML still trials_per_task, JSON unchanged). |
| internal/models/spec_test.go | Update tests for TrialsPerTask. |
| internal/orchestration/runner.go | Use TrialsPerTask when computing digests and outcomes. |
| internal/orchestration/runner_test.go | Update runner tests for TrialsPerTask. |
| internal/orchestration/runner_orchestration_test.go | Update orchestration tests for TrialsPerTask. |
| internal/orchestration/baseline_test.go | Update baseline tests for TrialsPerTask. |
| internal/cache/cache.go | Include TrialsPerTask in cache key. |
| internal/cache/cache_test.go | Update cache tests to use TrialsPerTask. |
| cmd/waza/testdata/run/eval-skill-search/.waza.yaml | New test workspace config pointing to .github/skills and .github/evals. |
| cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md | New test skill fixture. |
| cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md | New test skill fixture. |
| cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml | New test eval fixture. |
| cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml | New test task fixture. |
| cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml | New test eval fixture. |
| cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml | New test task fixture. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR updates waza run workspace/config resolution so that skill discovery and eval lookup honor .waza.yaml (notably .github/skills + .github/evals layouts), and aligns the “trials” config naming in code with the YAML field.
Changes:
- Track the directory containing the loaded
.waza.yamlviaProjectConfig.Dirand use it to resolve configured paths. - Allow workspace detection / eval lookup to accept absolute skills/evals directories (derived from config) and wire those through
waza run. - Rename
RunsPerTest→TrialsPerTaskin the benchmark config and update call sites/tests accordingly; add integration-style tests for various CWD layouts.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
internal/workspace/workspace.go |
Supports absolute skillsDir / evalsDir inputs for workspace detection and separated eval lookup. |
internal/projectconfig/config.go |
Adds ProjectConfig.Dir (directory of discovered .waza.yaml) and updates config file discovery to return the config path. |
internal/projectconfig/config_test.go |
Asserts ProjectConfig.Dir is set when config is discovered by walking up directories. |
cmd/waza/workspace_resolve.go |
Resolves config paths to absolute using cfg.Dir before passing into workspace.DetectContext. |
cmd/waza/cmd_run.go |
Pre-discovers skills from configured skills directory and injects them when the spec doesn’t define skill dirs; adds test hook for Copilot client creation. |
cmd/waza/cmd_run_test.go |
Adds tests covering skill/eval resolution behavior for different working directories using .waza.yaml-configured .github/* layout. |
internal/models/spec.go / internal/models/spec_test.go |
Renames config field to TrialsPerTask and updates validation/tests. |
internal/orchestration/* |
Switches runner logic/tests from RunsPerTest to TrialsPerTask while preserving outcome serialization field naming. |
internal/cache/cache.go / internal/cache/cache_test.go |
Updates cache key generation/tests to use TrialsPerTask. |
Comments suppressed due to low confidence (1)
cmd/waza/cmd_run.go:226
discovery.Discover(skillsPath)does a full recursivefilepath.Walkof the skills tree. This runs before specs are loaded, so it happens even for (a) mock runs, and (b) eval specs that already defineconfig.skill_directories. This can add noticeable overhead in repos with many/large skills. Consider deferring discovery until after the spec is loaded and only performing it whenspec.Config.SkillPathsis empty and the engine requires skills (e.g., copilot-sdk).
discoveredSkills, err := discovery.Discover(skillsPath)
if err != nil {
return err
}
for _, ds := range discoveredSkills {
skillFolders = append(skillFolders, ds.Dir)
}
slog.Info("Workspace skills added", "skills", skillFolders, "base", skillsPath)
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Might want to change this later if we think that implicit defaults were useful, but they weren't affected by this code before, so shouldn't be affected afterwards.
There was a problem hiding this comment.
Pull request overview
This PR updates waza run workspace resolution so skills/evals can be discovered relative to the .waza.yaml location (fixing .github/-style layouts), and renames the eval config field from RunsPerTest to TrialsPerTask across the codebase/tests.
Changes:
- Track the directory that
.waza.yamlwas loaded from (ProjectConfig.Dir) and use it to resolve config-relative skills/evals paths. - Extend workspace detection / eval lookup to handle absolute skills/evals directories (e.g., after resolving from
.waza.yaml). - Add/adjust tests and testdata for the new discovery behavior; rename
RunsPerTest→TrialsPerTaskthroughout.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/workspace/workspace.go | Accept absolute skills/evals dirs during workspace detection and FindEval. |
| internal/projectconfig/config.go | Return config path from discovery and set ProjectConfig.Dir accordingly. |
| internal/projectconfig/config_test.go | Add coverage for ProjectConfig.Dir. |
| cmd/waza/workspace_resolve.go | Resolve skills/evals dirs using cfg.Dir before passing to workspace detection. |
| cmd/waza/cmd_run.go | Discover skills from .waza.yaml paths and feed default skill dirs into Copilot engine; rename skillSpecPath field and thread default skills through execution. |
| cmd/waza/cmd_run_test.go | Add integration-style tests validating skill loading under .github/ workspace layouts; adjust for renamed fields. |
| cmd/waza/cmd_run_workspace_test.go | Update for evalSpecPath rename. |
| cmd/waza/testdata/run/eval-skill-search/** | New test fixtures for .waza.yaml + .github/{skills,evals} layout. |
| internal/models/spec.go | Rename Config.RunsPerTest → TrialsPerTask and update validation. |
| internal/models/spec_test.go | Update tests for renamed config field. |
| internal/orchestration/runner.go | Use TrialsPerTask for digest/statistics and execution loops. |
| internal/orchestration/runner_test.go | Update expected config field. |
| internal/orchestration/runner_orchestration_test.go | Update expected config field. |
| internal/orchestration/baseline_test.go | Update expected config field. |
| internal/cache/cache.go | Include TrialsPerTask in cache key. |
| internal/cache/cache_test.go | Update tests for renamed config field. |
Comments suppressed due to low confidence (1)
cmd/waza/cmd_run.go:375
- With the new absolute skills/evals directory support,
resolveSpecPaths(nil)will return all detected skills even when the current working directory is inside a specific eval folder (e.g.,<evals>/<skill>/). That meanswaza runwith no args from inside an eval directory will run multiple evals instead of scoping to the localeval.yaml, which conflicts with the PR description’s stated rules. Consider adding an early check to treat a localeval.yaml(especially when CWD is under the configured evals dir) as the single resolved spec path, and add a test to lock in that behavior.
// No args — workspace detection
wsCtx, err := resolveWorkspace(nil)
if err != nil {
return nil, fmt.Errorf("no eval.yaml specified and workspace detection failed: %w", err)
}
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes waza run skill discovery for workspaces where skills/evals live outside the repo root (e.g., .github/skills + .github/evals) by making .waza.yaml path settings resolve correctly and plumbing discovered skill directories into Copilot engine sessions by default.
Changes:
- Add
ProjectConfig.Dir(directory containing the discovered.waza.yaml) and use it to derive absolute skills/evals paths for workspace detection andwaza runskill discovery. - Update workspace detection / eval lookup to correctly handle absolute
skillsDir/evalsDiroptions. - Rename
RunsPerTest→TrialsPerTaskin the eval spec model and update runner/cache/tests accordingly; add regression tests + testdata for.github/*layouts.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
internal/workspace/workspace.go |
Allow absolute skillsDir/evalsDir inputs during detection and eval resolution. |
internal/projectconfig/config.go |
Add ProjectConfig.Dir; return config path from discovery to support config-relative resolution. |
internal/projectconfig/config_test.go |
Assert cfg.Dir is set correctly when .waza.yaml is discovered by walking up. |
cmd/waza/workspace_resolve.go |
Build workspace detect options using config-relative (absolute) skills/evals paths. |
cmd/waza/cmd_run.go |
Discover workspace skills from .waza.yaml and inject them as default skill directories; add overridable Copilot client factory for tests. |
cmd/waza/cmd_run_test.go |
Add integration-style tests covering cwd-dependent resolution with .github/skills + .github/evals; update helpers/signatures. |
cmd/waza/cmd_run_workspace_test.go |
Update field rename usage (evalSpecPath). |
cmd/waza/testdata/run/eval-skill-search/.waza.yaml |
New test fixture config pointing to .github/skills and .github/evals. |
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md |
New skill fixture for workspace resolution tests. |
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md |
New skill fixture for workspace resolution tests. |
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml |
New eval fixture for workspace resolution tests. |
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml |
New task fixture for workspace resolution tests. |
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml |
New eval fixture for workspace resolution tests. |
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml |
New task fixture for workspace resolution tests. |
internal/models/spec.go |
Rename config field to TrialsPerTask (YAML trials_per_task) and update validation. |
internal/models/spec_test.go |
Update spec parsing tests to validate TrialsPerTask. |
internal/orchestration/runner.go |
Use TrialsPerTask when computing digests and per-task run counts. |
internal/orchestration/runner_test.go |
Update runner tests for TrialsPerTask rename. |
internal/orchestration/runner_orchestration_test.go |
Update orchestration tests for TrialsPerTask rename. |
internal/orchestration/baseline_test.go |
Update baseline tests for TrialsPerTask rename. |
internal/cache/cache.go |
Include TrialsPerTask in cache key generation. |
internal/cache/cache_test.go |
Update cache tests for TrialsPerTask rename. |
You can also share your feedback on Copilot code review. Take the survey.
Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path
More stuff:
Small stuff:
Fixes #125