Skip to content

Fixing skill loading to use .waza.yaml#131

Merged
github-actions[bot] merged 12 commits into
microsoft:mainfrom
richardpark-msft:wz-yaml-fixes
Mar 17, 2026
Merged

Fixing skill loading to use .waza.yaml#131
github-actions[bot] merged 12 commits into
microsoft:mainfrom
richardpark-msft:wz-yaml-fixes

Conversation

@richardpark-msft

@richardpark-msft richardpark-msft commented Mar 16, 2026

Copy link
Copy Markdown
Member

Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path

More stuff:

  • Changed projectconfig.Load() to set the new ProjectConfig.Dir folder to the absolute path where we loaded the project configuration from. This lets us use the proper absolute path when we're loading or looking for skills in a different folder than where the .waza.yaml is located.
  • Fix workspace.DetectContext to allow for an absolute path to a skills dir or evals dir, in addition to it's normal heuristics.
  • Added in a bunch of tests, with a fixed but common layout, to show how things are resolved, depending on cwd. The rules are:
  • If you're in the evals folder, you're scoping it to choose evals. If you're in a specific evals folder only that eval runs.
  • If you're at the root (ie, with .waza.yaml), then all evals run.
  • If you're in skills, corresponding evals will run.

Small stuff:

  • renamed skillSpecPath.specPath to evalSpecPath
  • renamed [BenchmarkSpec (ie: eval.yaml)].Config.RunsPerTest to TrialsPerTask, to match the field name in the YAML.

Fixes #125

- Make it so waza run loads up the skill folder in .waza.yaml, unless overridden by the user entering their own skill path
- Changed projectconfig.Load() to set the new ProjectConfig.Dir folder to the absolute path where we loaded the project configuration from. This lets us use the proper absolute path when we're loading or looking for skills in a different folder than where the .waza.yaml is located.
- Fix workspace.DetectContext to allow for an absolute path to a skills dir or evals dir, in addition to it's normal heuristics.
- Added in a bunch of tests, with a fixed but common layout, to show how things are resolved, depending on cwd. The rules are:
  - If you're in the evals folder, you're scoping it to choose evals. If you're in a _specific_ evals folder only that eval runs.
  - If you're at the root (ie, with .waza.yaml), then all evals run.
  - If you're in skills, corresponding evals will run.

Small stuff:
- renamed skillSpecPath.specPath to evalSpecPath
- renamed [BenchmarkSpec (ie: eval.yaml)].Config.RunsPerTest to TrialsPerTask, to match the field name in the YAML.
Copilot AI review requested due to automatic review settings March 16, 2026 21:51
@github-actions github-actions Bot enabled auto-merge (squash) March 16, 2026 21:51
@codecov-commenter

codecov-commenter commented Mar 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.35443% with 25 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@3068653). Learn more about missing BASE report.

Files with missing lines Patch % Lines
cmd/waza/cmd_run.go 66.66% 13 Missing and 1 partial ⚠️
internal/workspace/workspace.go 61.53% 2 Missing and 3 partials ⚠️
internal/models/spec.go 0.00% 1 Missing and 1 partial ⚠️
internal/projectconfig/config.go 71.42% 2 Missing ⚠️
cmd/waza/workspace_resolve.go 90.90% 1 Missing ⚠️
internal/cache/cache.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #131   +/-   ##
=======================================
  Coverage        ?   73.88%           
=======================================
  Files           ?      138           
  Lines           ?    15828           
  Branches        ?        0           
=======================================
  Hits            ?    11694           
  Misses          ?     3287           
  Partials        ?      847           
Flag Coverage Δ
go-implementation 73.88% <68.35%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates waza run workspace/config resolution so skills/evals can be located relative to the directory containing .waza.yaml (not just the current working directory), and aligns the eval config “trials” field naming across code and tests.

Changes:

  • Track the on-disk config location via ProjectConfig.Dir and use it to resolve configured skills/evals directories.
  • Extend workspace detection/eval lookup to support absolute skills/evals directories.
  • Rename Config.RunsPerTestTrialsPerTask and update runner/cache/tests accordingly; add CLI tests covering skill/eval resolution by CWD.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/projectconfig/config.go Add ProjectConfig.Dir; adjust config discovery to return config path + bytes.
internal/projectconfig/config_test.go Assert cfg.Dir is set when walking up to find .waza.yaml.
cmd/waza/workspace_resolve.go Resolve WithSkillsDir / WithEvalsDir relative to cfg.Dir to support config-rooted layouts.
internal/workspace/workspace.go Allow absolute skills dir in DetectContext; allow absolute evals dir in FindEval.
cmd/waza/cmd_run.go Discover workspace skills from configured skills dir and inject as default skill directories; allow overriding Copilot client in tests.
cmd/waza/cmd_run_test.go Add end-to-end-ish tests validating eval selection + skill loading across different CWDs.
cmd/waza/cmd_run_workspace_test.go Update to renamed evalSpecPath field.
internal/models/spec.go Rename RunsPerTest field to TrialsPerTask (YAML key unchanged).
internal/models/spec_test.go Update assertions for renamed trials field.
internal/orchestration/runner.go Use TrialsPerTask when computing digests and running trials.
internal/orchestration/*_test.go Update tests to the renamed trials field.
internal/cache/cache.go Use TrialsPerTask in cache key.
internal/cache/cache_test.go Update tests to the renamed trials field.
cmd/waza/testdata/run/eval-skill-search/** Add test fixture workspace with .waza.yaml pointing to .github/skills and .github/evals.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread cmd/waza/cmd_run.go
Comment thread internal/cache/cache_test.go
Copilot AI review requested due to automatic review settings March 16, 2026 23:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates workspace/config path resolution so waza run can reliably discover and load skills/evals based on .waza.yaml (including .github/{skills,evals} layouts), and aligns the eval config field name with the YAML key.

Changes:

  • Add ProjectConfig.Dir (absolute) to anchor config-relative path resolution and use it for workspace detection overrides.
  • Extend workspace detection / eval lookup to support absolute skills/evals directories (e.g., from .waza.yaml) and wire discovered skill directories into Copilot session creation by default.
  • Rename RunsPerTestTrialsPerTask in the eval config model and update call sites/tests accordingly; add workspace-resolution integration tests + testdata.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/workspace/workspace.go Allow absolute skills/evals paths in workspace detection and eval resolution.
internal/projectconfig/config.go Track absolute config directory via ProjectConfig.Dir; return config path from finder.
internal/projectconfig/config_test.go Assert ProjectConfig.Dir is set when loading config from parent dirs.
cmd/waza/workspace_resolve.go Resolve skills/evals dirs relative to ProjectConfig.Dir before workspace detection.
cmd/waza/cmd_run.go Discover skills from .waza.yaml skills path and inject as default skill dirs for Copilot runs; rename spec-path field.
cmd/waza/cmd_run_test.go Add end-to-end run tests for .github/{skills,evals} workspace layouts using a mocked Copilot client/session.
cmd/waza/cmd_run_workspace_test.go Update spec-path field rename in workspace-path resolution tests.
internal/models/spec.go Rename config field to TrialsPerTask (YAML still trials_per_task, JSON unchanged).
internal/models/spec_test.go Update tests for TrialsPerTask.
internal/orchestration/runner.go Use TrialsPerTask when computing digests and outcomes.
internal/orchestration/runner_test.go Update runner tests for TrialsPerTask.
internal/orchestration/runner_orchestration_test.go Update orchestration tests for TrialsPerTask.
internal/orchestration/baseline_test.go Update baseline tests for TrialsPerTask.
internal/cache/cache.go Include TrialsPerTask in cache key.
internal/cache/cache_test.go Update cache tests to use TrialsPerTask.
cmd/waza/testdata/run/eval-skill-search/.waza.yaml New test workspace config pointing to .github/skills and .github/evals.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md New test skill fixture.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md New test skill fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml New test eval fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml New test task fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml New test eval fixture.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml New test task fixture.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread cmd/waza/cmd_run.go Outdated
Comment thread cmd/waza/cmd_run_test.go
Copilot AI review requested due to automatic review settings March 17, 2026 00:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates waza run workspace/config resolution so that skill discovery and eval lookup honor .waza.yaml (notably .github/skills + .github/evals layouts), and aligns the “trials” config naming in code with the YAML field.

Changes:

  • Track the directory containing the loaded .waza.yaml via ProjectConfig.Dir and use it to resolve configured paths.
  • Allow workspace detection / eval lookup to accept absolute skills/evals directories (derived from config) and wire those through waza run.
  • Rename RunsPerTestTrialsPerTask in the benchmark config and update call sites/tests accordingly; add integration-style tests for various CWD layouts.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/workspace/workspace.go Supports absolute skillsDir / evalsDir inputs for workspace detection and separated eval lookup.
internal/projectconfig/config.go Adds ProjectConfig.Dir (directory of discovered .waza.yaml) and updates config file discovery to return the config path.
internal/projectconfig/config_test.go Asserts ProjectConfig.Dir is set when config is discovered by walking up directories.
cmd/waza/workspace_resolve.go Resolves config paths to absolute using cfg.Dir before passing into workspace.DetectContext.
cmd/waza/cmd_run.go Pre-discovers skills from configured skills directory and injects them when the spec doesn’t define skill dirs; adds test hook for Copilot client creation.
cmd/waza/cmd_run_test.go Adds tests covering skill/eval resolution behavior for different working directories using .waza.yaml-configured .github/* layout.
internal/models/spec.go / internal/models/spec_test.go Renames config field to TrialsPerTask and updates validation/tests.
internal/orchestration/* Switches runner logic/tests from RunsPerTest to TrialsPerTask while preserving outcome serialization field naming.
internal/cache/cache.go / internal/cache/cache_test.go Updates cache key generation/tests to use TrialsPerTask.
Comments suppressed due to low confidence (1)

cmd/waza/cmd_run.go:226

  • discovery.Discover(skillsPath) does a full recursive filepath.Walk of the skills tree. This runs before specs are loaded, so it happens even for (a) mock runs, and (b) eval specs that already define config.skill_directories. This can add noticeable overhead in repos with many/large skills. Consider deferring discovery until after the spec is loaded and only performing it when spec.Config.SkillPaths is empty and the engine requires skills (e.g., copilot-sdk).
			discoveredSkills, err := discovery.Discover(skillsPath)

			if err != nil {
				return err
			}

			for _, ds := range discoveredSkills {
				skillFolders = append(skillFolders, ds.Dir)
			}

			slog.Info("Workspace skills added", "skills", skillFolders, "base", skillsPath)

You can also share your feedback on Copilot code review. Take the survey.

Comment thread cmd/waza/cmd_run_test.go
Comment thread cmd/waza/cmd_run_test.go Outdated
Comment thread internal/projectconfig/config.go Outdated
Comment thread cmd/waza/cmd_run.go
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 17, 2026 00:25
richardpark-msft added 2 commits March 16, 2026 17:26
Might want to change this later if we think that implicit defaults were useful, but they weren't affected by this code before, so shouldn't be affected afterwards.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates waza run workspace resolution so skills/evals can be discovered relative to the .waza.yaml location (fixing .github/-style layouts), and renames the eval config field from RunsPerTest to TrialsPerTask across the codebase/tests.

Changes:

  • Track the directory that .waza.yaml was loaded from (ProjectConfig.Dir) and use it to resolve config-relative skills/evals paths.
  • Extend workspace detection / eval lookup to handle absolute skills/evals directories (e.g., after resolving from .waza.yaml).
  • Add/adjust tests and testdata for the new discovery behavior; rename RunsPerTestTrialsPerTask throughout.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/workspace/workspace.go Accept absolute skills/evals dirs during workspace detection and FindEval.
internal/projectconfig/config.go Return config path from discovery and set ProjectConfig.Dir accordingly.
internal/projectconfig/config_test.go Add coverage for ProjectConfig.Dir.
cmd/waza/workspace_resolve.go Resolve skills/evals dirs using cfg.Dir before passing to workspace detection.
cmd/waza/cmd_run.go Discover skills from .waza.yaml paths and feed default skill dirs into Copilot engine; rename skillSpecPath field and thread default skills through execution.
cmd/waza/cmd_run_test.go Add integration-style tests validating skill loading under .github/ workspace layouts; adjust for renamed fields.
cmd/waza/cmd_run_workspace_test.go Update for evalSpecPath rename.
cmd/waza/testdata/run/eval-skill-search/** New test fixtures for .waza.yaml + .github/{skills,evals} layout.
internal/models/spec.go Rename Config.RunsPerTestTrialsPerTask and update validation.
internal/models/spec_test.go Update tests for renamed config field.
internal/orchestration/runner.go Use TrialsPerTask for digest/statistics and execution loops.
internal/orchestration/runner_test.go Update expected config field.
internal/orchestration/runner_orchestration_test.go Update expected config field.
internal/orchestration/baseline_test.go Update expected config field.
internal/cache/cache.go Include TrialsPerTask in cache key.
internal/cache/cache_test.go Update tests for renamed config field.
Comments suppressed due to low confidence (1)

cmd/waza/cmd_run.go:375

  • With the new absolute skills/evals directory support, resolveSpecPaths(nil) will return all detected skills even when the current working directory is inside a specific eval folder (e.g., <evals>/<skill>/). That means waza run with no args from inside an eval directory will run multiple evals instead of scoping to the local eval.yaml, which conflicts with the PR description’s stated rules. Consider adding an early check to treat a local eval.yaml (especially when CWD is under the configured evals dir) as the single resolved spec path, and add a test to lock in that behavior.
	// No args — workspace detection
	wsCtx, err := resolveWorkspace(nil)
	if err != nil {
		return nil, fmt.Errorf("no eval.yaml specified and workspace detection failed: %w", err)
	}

You can also share your feedback on Copilot code review. Take the survey.

Comment thread cmd/waza/cmd_run.go
Comment thread internal/projectconfig/config_test.go Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 17, 2026 00:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes waza run skill discovery for workspaces where skills/evals live outside the repo root (e.g., .github/skills + .github/evals) by making .waza.yaml path settings resolve correctly and plumbing discovered skill directories into Copilot engine sessions by default.

Changes:

  • Add ProjectConfig.Dir (directory containing the discovered .waza.yaml) and use it to derive absolute skills/evals paths for workspace detection and waza run skill discovery.
  • Update workspace detection / eval lookup to correctly handle absolute skillsDir/evalsDir options.
  • Rename RunsPerTestTrialsPerTask in the eval spec model and update runner/cache/tests accordingly; add regression tests + testdata for .github/* layouts.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/workspace/workspace.go Allow absolute skillsDir/evalsDir inputs during detection and eval resolution.
internal/projectconfig/config.go Add ProjectConfig.Dir; return config path from discovery to support config-relative resolution.
internal/projectconfig/config_test.go Assert cfg.Dir is set correctly when .waza.yaml is discovered by walking up.
cmd/waza/workspace_resolve.go Build workspace detect options using config-relative (absolute) skills/evals paths.
cmd/waza/cmd_run.go Discover workspace skills from .waza.yaml and inject them as default skill directories; add overridable Copilot client factory for tests.
cmd/waza/cmd_run_test.go Add integration-style tests covering cwd-dependent resolution with .github/skills + .github/evals; update helpers/signatures.
cmd/waza/cmd_run_workspace_test.go Update field rename usage (evalSpecPath).
cmd/waza/testdata/run/eval-skill-search/.waza.yaml New test fixture config pointing to .github/skills and .github/evals.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-a/SKILL.md New skill fixture for workspace resolution tests.
cmd/waza/testdata/run/eval-skill-search/.github/skills/test-skill-b/SKILL.md New skill fixture for workspace resolution tests.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/eval.yaml New eval fixture for workspace resolution tests.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-a/tasks/test-task.yaml New task fixture for workspace resolution tests.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/eval.yaml New eval fixture for workspace resolution tests.
cmd/waza/testdata/run/eval-skill-search/.github/evals/test-skill-b/tasks/test-task.yaml New task fixture for workspace resolution tests.
internal/models/spec.go Rename config field to TrialsPerTask (YAML trials_per_task) and update validation.
internal/models/spec_test.go Update spec parsing tests to validate TrialsPerTask.
internal/orchestration/runner.go Use TrialsPerTask when computing digests and per-task run counts.
internal/orchestration/runner_test.go Update runner tests for TrialsPerTask rename.
internal/orchestration/runner_orchestration_test.go Update orchestration tests for TrialsPerTask rename.
internal/orchestration/baseline_test.go Update baseline tests for TrialsPerTask rename.
internal/cache/cache.go Include TrialsPerTask in cache key generation.
internal/cache/cache_test.go Update cache tests for TrialsPerTask rename.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread internal/projectconfig/config.go
Comment thread cmd/waza/cmd_run.go Outdated
Comment thread cmd/waza/cmd_run_test.go
Comment thread cmd/waza/cmd_run.go
@github-actions github-actions Bot merged commit a583291 into microsoft:main Mar 17, 2026
6 checks passed
@richardpark-msft richardpark-msft deleted the wz-yaml-fixes branch March 17, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skills aren't loaded properly when doing a 'waza run' with a .github/ style workspace

4 participants