Fixing skill loading to use .waza.yaml by richardpark-msft · Pull Request #131 · microsoft/waza

richardpark-msft · 2026-03-16T21:51:22Z

Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path

More stuff:

Changed projectconfig.Load() to set the new ProjectConfig.Dir folder to the absolute path where we loaded the project configuration from. This lets us use the proper absolute path when we're loading or looking for skills in a different folder than where the .waza.yaml is located.
Fix workspace.DetectContext to allow for an absolute path to a skills dir or evals dir, in addition to it's normal heuristics.
Added in a bunch of tests, with a fixed but common layout, to show how things are resolved, depending on cwd. The rules are:
If you're in the evals folder, you're scoping it to choose evals. If you're in a specific evals folder only that eval runs.
If you're at the root (ie, with .waza.yaml), then all evals run.
If you're in skills, corresponding evals will run.

Small stuff:

renamed skillSpecPath.specPath to evalSpecPath
renamed [BenchmarkSpec (ie: eval.yaml)].Config.RunsPerTest to TrialsPerTask, to match the field name in the YAML.

Fixes #125

- Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path - Changed projectconfig.Load() to set the new ProjectConfig.Dir folder to the absolute path where we loaded the project configuration from. This lets us use the proper absolute path when we're loading or looking for skills in a different folder than where the .waza.yaml is located. - Fix workspace.DetectContext to allow for an absolute path to a skills dir or evals dir, in addition to it's normal heuristics. - Added in a bunch of tests, with a fixed but common layout, to show how things are resolved, depending on cwd. The rules are: - If you're in the evals folder, you're scoping it to choose evals. If you're in a _specific_ evals folder only that eval runs. - If you're at the root (ie, with .waza.yaml), then all evals run. - If you're in skills, corresponding evals will run. Small stuff: - renamed skillSpecPath.specPath to evalSpecPath - renamed [BenchmarkSpec (ie: eval.yaml)].Config.RunsPerTest to TrialsPerTask, to match the field name in the YAML.

codecov-commenter · 2026-03-16T21:56:03Z

Codecov Report

❌ Patch coverage is 68.35443% with 25 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@3068653). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/cmd_run.go	66.66%	13 Missing and 1 partial ⚠️
internal/workspace/workspace.go	61.53%	2 Missing and 3 partials ⚠️
internal/models/spec.go	0.00%	1 Missing and 1 partial ⚠️
internal/projectconfig/config.go	71.42%	2 Missing ⚠️
cmd/waza/workspace_resolve.go	90.90%	1 Missing ⚠️
internal/cache/cache.go	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #131   +/-   ##
=======================================
  Coverage        ?   73.88%           
=======================================
  Files           ?      138           
  Lines           ?    15828           
  Branches        ?        0           
=======================================
  Hits            ?    11694           
  Misses          ?     3287           
  Partials        ?      847

Flag	Coverage Δ
go-implementation	`73.88% <68.35%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR updates waza run workspace/config resolution so skills/evals can be located relative to the directory containing .waza.yaml (not just the current working directory), and aligns the eval config “trials” field naming across code and tests.

Changes:

Track the on-disk config location via ProjectConfig.Dir and use it to resolve configured skills/evals directories.
Extend workspace detection/eval lookup to support absolute skills/evals directories.
Rename Config.RunsPerTest → TrialsPerTask and update runner/cache/tests accordingly; add CLI tests covering skill/eval resolution by CWD.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
internal/projectconfig/config.go	Add `ProjectConfig.Dir`; adjust config discovery to return config path + bytes.
internal/projectconfig/config_test.go	Assert `cfg.Dir` is set when walking up to find `.waza.yaml`.
cmd/waza/workspace_resolve.go	Resolve `WithSkillsDir` / `WithEvalsDir` relative to `cfg.Dir` to support config-rooted layouts.
internal/workspace/workspace.go	Allow absolute skills dir in `DetectContext`; allow absolute evals dir in `FindEval`.
cmd/waza/cmd_run.go	Discover workspace skills from configured skills dir and inject as default skill directories; allow overriding Copilot client in tests.
cmd/waza/cmd_run_test.go	Add end-to-end-ish tests validating eval selection + skill loading across different CWDs.
cmd/waza/cmd_run_workspace_test.go	Update to renamed `evalSpecPath` field.
internal/models/spec.go	Rename `RunsPerTest` field to `TrialsPerTask` (YAML key unchanged).
internal/models/spec_test.go	Update assertions for renamed trials field.
internal/orchestration/runner.go	Use `TrialsPerTask` when computing digests and running trials.
internal/orchestration/*_test.go	Update tests to the renamed trials field.
internal/cache/cache.go	Use `TrialsPerTask` in cache key.
internal/cache/cache_test.go	Update tests to the renamed trials field.
cmd/waza/testdata/run/eval-skill-search/**	Add test fixture workspace with `.waza.yaml` pointing to `.github/skills` and `.github/evals`.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

Updates workspace/config path resolution so waza run can reliably discover and load skills/evals based on .waza.yaml (including .github/{skills,evals} layouts), and aligns the eval config field name with the YAML key.

Changes:

Add ProjectConfig.Dir (absolute) to anchor config-relative path resolution and use it for workspace detection overrides.
Extend workspace detection / eval lookup to support absolute skills/evals directories (e.g., from .waza.yaml) and wire discovered skill directories into Copilot session creation by default.
Rename RunsPerTest → TrialsPerTask in the eval config model and update call sites/tests accordingly; add workspace-resolution integration tests + testdata.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
internal/workspace/workspace.go	Allow absolute skills/evals paths in workspace detection and eval resolution.
internal/projectconfig/config.go	Track absolute config directory via `ProjectConfig.Dir`; return config path from finder.
internal/projectconfig/config_test.go	Assert `ProjectConfig.Dir` is set when loading config from parent dirs.
cmd/waza/workspace_resolve.go	Resolve skills/evals dirs relative to `ProjectConfig.Dir` before workspace detection.
cmd/waza/cmd_run.go	Discover skills from `.waza.yaml` skills path and inject as default skill dirs for Copilot runs; rename spec-path field.
cmd/waza/cmd_run_test.go	Add end-to-end run tests for `.github/{skills,evals}` workspace layouts using a mocked Copilot client/session.
cmd/waza/cmd_run_workspace_test.go	Update spec-path field rename in workspace-path resolution tests.
internal/models/spec.go	Rename config field to `TrialsPerTask` (YAML still `trials_per_task`, JSON unchanged).
internal/models/spec_test.go	Update tests for `TrialsPerTask`.
internal/orchestration/runner.go	Use `TrialsPerTask` when computing digests and outcomes.
internal/orchestration/runner_test.go	Update runner tests for `TrialsPerTask`.
internal/orchestration/runner_orchestration_test.go	Update orchestration tests for `TrialsPerTask`.
internal/orchestration/baseline_test.go	Update baseline tests for `TrialsPerTask`.
internal/cache/cache.go	Include `TrialsPerTask` in cache key.
internal/cache/cache_test.go	Update cache tests to use `TrialsPerTask`.
cmd/waza/testdata/run/eval-skill-search/.waza.yaml	New test workspace config pointing to `.github/skills` and `.github/evals`.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md	New test skill fixture.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md	New test skill fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml	New test eval fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml	New test task fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml	New test eval fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml	New test task fixture.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

This PR updates waza run workspace/config resolution so that skill discovery and eval lookup honor .waza.yaml (notably .github/skills + .github/evals layouts), and aligns the “trials” config naming in code with the YAML field.

Changes:

Track the directory containing the loaded .waza.yaml via ProjectConfig.Dir and use it to resolve configured paths.
Allow workspace detection / eval lookup to accept absolute skills/evals directories (derived from config) and wire those through waza run.
Rename RunsPerTest → TrialsPerTask in the benchmark config and update call sites/tests accordingly; add integration-style tests for various CWD layouts.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`internal/workspace/workspace.go`	Supports absolute `skillsDir` / `evalsDir` inputs for workspace detection and separated eval lookup.
`internal/projectconfig/config.go`	Adds `ProjectConfig.Dir` (directory of discovered `.waza.yaml`) and updates config file discovery to return the config path.
`internal/projectconfig/config_test.go`	Asserts `ProjectConfig.Dir` is set when config is discovered by walking up directories.
`cmd/waza/workspace_resolve.go`	Resolves config paths to absolute using `cfg.Dir` before passing into `workspace.DetectContext`.
`cmd/waza/cmd_run.go`	Pre-discovers skills from configured skills directory and injects them when the spec doesn’t define skill dirs; adds test hook for Copilot client creation.
`cmd/waza/cmd_run_test.go`	Adds tests covering skill/eval resolution behavior for different working directories using `.waza.yaml`-configured `.github/*` layout.
`internal/models/spec.go` / `internal/models/spec_test.go`	Renames config field to `TrialsPerTask` and updates validation/tests.
`internal/orchestration/*`	Switches runner logic/tests from `RunsPerTest` to `TrialsPerTask` while preserving outcome serialization field naming.
`internal/cache/cache.go` / `internal/cache/cache_test.go`	Updates cache key generation/tests to use `TrialsPerTask`.

Comments suppressed due to low confidence (1)

cmd/waza/cmd_run.go:226

discovery.Discover(skillsPath) does a full recursive filepath.Walk of the skills tree. This runs before specs are loaded, so it happens even for (a) mock runs, and (b) eval specs that already define config.skill_directories. This can add noticeable overhead in repos with many/large skills. Consider deferring discovery until after the spec is loaded and only performing it when spec.Config.SkillPaths is empty and the engine requires skills (e.g., copilot-sdk).

			discoveredSkills, err := discovery.Discover(skillsPath)

			if err != nil {
				return err
			}

			for _, ds := range discoveredSkills {
				skillFolders = append(skillFolders, ds.Dir)
			}

			slog.Info("Workspace skills added", "skills", skillFolders, "base", skillsPath)

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Might want to change this later if we think that implicit defaults were useful, but they weren't affected by this code before, so shouldn't be affected afterwards.

Copilot

Pull request overview

This PR updates waza run workspace resolution so skills/evals can be discovered relative to the .waza.yaml location (fixing .github/-style layouts), and renames the eval config field from RunsPerTest to TrialsPerTask across the codebase/tests.

Changes:

Track the directory that .waza.yaml was loaded from (ProjectConfig.Dir) and use it to resolve config-relative skills/evals paths.
Extend workspace detection / eval lookup to handle absolute skills/evals directories (e.g., after resolving from .waza.yaml).
Add/adjust tests and testdata for the new discovery behavior; rename RunsPerTest → TrialsPerTask throughout.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
internal/workspace/workspace.go	Accept absolute skills/evals dirs during workspace detection and `FindEval`.
internal/projectconfig/config.go	Return config path from discovery and set `ProjectConfig.Dir` accordingly.
internal/projectconfig/config_test.go	Add coverage for `ProjectConfig.Dir`.
cmd/waza/workspace_resolve.go	Resolve skills/evals dirs using `cfg.Dir` before passing to workspace detection.
cmd/waza/cmd_run.go	Discover skills from `.waza.yaml` paths and feed default skill dirs into Copilot engine; rename `skillSpecPath` field and thread default skills through execution.
cmd/waza/cmd_run_test.go	Add integration-style tests validating skill loading under `.github/` workspace layouts; adjust for renamed fields.
cmd/waza/cmd_run_workspace_test.go	Update for `evalSpecPath` rename.
cmd/waza/testdata/run/eval-skill-search/**	New test fixtures for `.waza.yaml` + `.github/{skills,evals}` layout.
internal/models/spec.go	Rename `Config.RunsPerTest` → `TrialsPerTask` and update validation.
internal/models/spec_test.go	Update tests for renamed config field.
internal/orchestration/runner.go	Use `TrialsPerTask` for digest/statistics and execution loops.
internal/orchestration/runner_test.go	Update expected config field.
internal/orchestration/runner_orchestration_test.go	Update expected config field.
internal/orchestration/baseline_test.go	Update expected config field.
internal/cache/cache.go	Include `TrialsPerTask` in cache key.
internal/cache/cache_test.go	Update tests for renamed config field.

Comments suppressed due to low confidence (1)

cmd/waza/cmd_run.go:375

With the new absolute skills/evals directory support, resolveSpecPaths(nil) will return all detected skills even when the current working directory is inside a specific eval folder (e.g., <evals>/<skill>/). That means waza run with no args from inside an eval directory will run multiple evals instead of scoping to the local eval.yaml, which conflicts with the PR description’s stated rules. Consider adding an early check to treat a local eval.yaml (especially when CWD is under the configured evals dir) as the single resolved spec path, and add a test to lock in that behavior.

	// No args — workspace detection
	wsCtx, err := resolveWorkspace(nil)
	if err != nil {
		return nil, fmt.Errorf("no eval.yaml specified and workspace detection failed: %w", err)
	}

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR fixes waza run skill discovery for workspaces where skills/evals live outside the repo root (e.g., .github/skills + .github/evals) by making .waza.yaml path settings resolve correctly and plumbing discovered skill directories into Copilot engine sessions by default.

Changes:

Add ProjectConfig.Dir (directory containing the discovered .waza.yaml) and use it to derive absolute skills/evals paths for workspace detection and waza run skill discovery.
Update workspace detection / eval lookup to correctly handle absolute skillsDir/evalsDir options.
Rename RunsPerTest → TrialsPerTask in the eval spec model and update runner/cache/tests accordingly; add regression tests + testdata for .github/* layouts.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`internal/workspace/workspace.go`	Allow absolute `skillsDir`/`evalsDir` inputs during detection and eval resolution.
`internal/projectconfig/config.go`	Add `ProjectConfig.Dir`; return config path from discovery to support config-relative resolution.
`internal/projectconfig/config_test.go`	Assert `cfg.Dir` is set correctly when `.waza.yaml` is discovered by walking up.
`cmd/waza/workspace_resolve.go`	Build workspace detect options using config-relative (absolute) skills/evals paths.
`cmd/waza/cmd_run.go`	Discover workspace skills from `.waza.yaml` and inject them as default skill directories; add overridable Copilot client factory for tests.
`cmd/waza/cmd_run_test.go`	Add integration-style tests covering cwd-dependent resolution with `.github/skills` + `.github/evals`; update helpers/signatures.
`cmd/waza/cmd_run_workspace_test.go`	Update field rename usage (`evalSpecPath`).
`cmd/waza/testdata/run/eval-skill-search/.waza.yaml`	New test fixture config pointing to `.github/skills` and `.github/evals`.
`cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md`	New skill fixture for workspace resolution tests.
`cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md`	New skill fixture for workspace resolution tests.
`cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml`	New eval fixture for workspace resolution tests.
`cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml`	New task fixture for workspace resolution tests.
`cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml`	New eval fixture for workspace resolution tests.
`cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml`	New task fixture for workspace resolution tests.
`internal/models/spec.go`	Rename config field to `TrialsPerTask` (YAML `trials_per_task`) and update validation.
`internal/models/spec_test.go`	Update spec parsing tests to validate `TrialsPerTask`.
`internal/orchestration/runner.go`	Use `TrialsPerTask` when computing digests and per-task run counts.
`internal/orchestration/runner_test.go`	Update runner tests for `TrialsPerTask` rename.
`internal/orchestration/runner_orchestration_test.go`	Update orchestration tests for `TrialsPerTask` rename.
`internal/orchestration/baseline_test.go`	Update baseline tests for `TrialsPerTask` rename.
`internal/cache/cache.go`	Include `TrialsPerTask` in cache key generation.
`internal/cache/cache_test.go`	Update cache tests for `TrialsPerTask` rename.

You can also share your feedback on Copilot code review. Take the survey.

into wz-yaml-fixes

Copilot AI review requested due to automatic review settings March 16, 2026 21:51

richardpark-msft requested review from chlowell, spboyer and wbreza as code owners March 16, 2026 21:51

github-actions Bot enabled auto-merge (squash) March 16, 2026 21:51

Copilot started reviewing on behalf of richardpark-msft March 16, 2026 21:52 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go

Comment thread internal/cache/cache_test.go

richardpark-msft added 3 commits March 16, 2026 16:12

Fix path comparison issues.

0e79cac

Break out the warnings a bit more.

0ac671f

temp debug, something interesting happening with temp path.

165902c

Copilot AI review requested due to automatic review settings March 16, 2026 23:46

Copilot started reviewing on behalf of richardpark-msft March 16, 2026 23:46 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go Outdated

Comment thread cmd/waza/cmd_run_test.go

richardpark-msft added 2 commits March 16, 2026 17:16

Changing paths so I'm less dependent on whether LFNs are enabled.

04321c5

Fixing error message - copy/paste error.

b58fe75

Copilot AI review requested due to automatic review settings March 17, 2026 00:18

Copilot started reviewing on behalf of richardpark-msft March 17, 2026 00:18 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run_test.go

Comment thread cmd/waza/cmd_run_test.go Outdated

Comment thread internal/projectconfig/config.go Outdated

Comment thread cmd/waza/cmd_run.go

Potential fix for pull request finding

4bf292b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 17, 2026 00:25

Copilot started reviewing on behalf of richardpark-msft March 17, 2026 00:26 View session

richardpark-msft added 2 commits March 16, 2026 17:26

Fixing a docs issue

f335208

use .Dir as a reasonable proxy for "user has a .waza.yaml file".

77017cc

Might want to change this later if we think that implicit defaults were useful, but they weren't affected by this code before, so shouldn't be affected afterwards.

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go

Comment thread internal/projectconfig/config_test.go Outdated

Potential fix for pull request finding

03f1a0e

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 17, 2026 00:32

Copilot started reviewing on behalf of richardpark-msft March 17, 2026 00:33 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread internal/projectconfig/config.go

Comment thread cmd/waza/cmd_run.go Outdated

Comment thread cmd/waza/cmd_run_test.go

richardpark-msft added 2 commits March 16, 2026 17:40

Had two checks on cfg.Dir!

520796f

Merge branch 'wz-yaml-fixes' of https://github.com/richardpark-msft/waza

708d99c

into wz-yaml-fixes

chlowell approved these changes Mar 17, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go

github-actions Bot merged commit a583291 into microsoft:main Mar 17, 2026
6 checks passed

richardpark-msft deleted the wz-yaml-fixes branch March 17, 2026 16:42

richardpark-msft mentioned this pull request Mar 18, 2026

Skills aren't loaded properly when doing a 'waza run' with a .github/ style workspace #125

Closed

Uh oh!

Conversation

richardpark-msft commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richardpark-msft commented Mar 16, 2026 •

edited

Loading

codecov-commenter commented Mar 16, 2026 •

edited

Loading