Doing bulk mechanical renames using Go refactor/rename by richardpark-msft · Pull Request #128 · microsoft/waza

richardpark-msft · 2026-03-13T02:12:17Z

BenchmarkSpec to EvalSpec
Config to EvalConfig
TaskSpec renames
- TestCase to TaskSpec
- TestStimulus -> TaskInputs
- ValidatorInline -> Grader
Grader.Kind -> Grader.Type
Some tests were also renamed, as well as local variables
Some comments (those were search and replace)

- TestCase to TaskSpec - TestStimulus -> TaskInputs - ValidatorInline -> Grader

Copilot

Pull request overview

This PR applies a large set of mechanical renames across the evaluation (“benchmark”) pipeline, updating core model types and propagating those changes through orchestration, graders, caching, config, CLI, JSON-RPC handlers, and tests.

Changes:

Renames models.BenchmarkSpec → models.EvalSpec and models.Config → models.EvalConfig across runtime code and tests.
Renames task-level structures (TestCase → TaskSpec, TestStimulus → TaskInputs, ValidatorInline → Grader) and updates call sites.
Renames grader identity/type plumbing (GraderKind → GraderType, Grader.Kind() → Grader.Type()), updating all grader implementations and tests.

Reviewed changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
internal/trigger/runner_test.go	Updates tests to use `EvalSpec` instead of `BenchmarkSpec`.
internal/transcript/transcript_test.go	Updates transcript test fixtures to `TaskSpec` / `TaskInputs`.
internal/transcript/transcript.go	Updates `BuildTaskTranscript` to accept `TaskSpec` and read from `Inputs`.
internal/suggest/suggest.go	Updates YAML validation to unmarshal into `EvalSpec`.
internal/orchestration/runner_test.go	Updates runner tests for `EvalSpec` / `EvalConfig` and task structs.
internal/orchestration/runner_orchestration_test.go	Updates orchestration tests for renamed spec/task/grader fields and `Type()`.
internal/orchestration/runner.go	Converts runner task loading/execution/grading flow to `TaskSpec` / `TaskInputs`.
internal/orchestration/filter_test.go	Updates filter tests to operate on `TaskSpec`.
internal/orchestration/filter.go	Updates filter APIs to accept/return `TaskSpec`.
internal/orchestration/csv_integration_test.go	Updates CSV task generation tests to `EvalSpec` and `TaskSpec.Inputs`.
internal/orchestration/baseline_test.go	Updates baseline tests to use `EvalSpec` / `EvalConfig`.
internal/models/taskspec_test.go	Adds coverage for `should_trigger` YAML → `ExpectedTrigger` pointer behavior.
internal/models/taskspec.go	Renames task model types; updates inline grader struct to `Grader` with `Type`.
internal/models/spec.go	Renames spec/config types to `EvalSpec` / `EvalConfig` and updates grader kind type.
internal/models/outcome.go	Renames `GraderKind` → `GraderType` and updates results typing.
internal/models/grader_params_test.go	Updates polymorphic parameter tests to read task-level graders via `tc.Graders`.
internal/models/grader_params.go	Updates parameter decoding to accept `GraderType`.
internal/models/baseline_test.go	Updates baseline YAML test to unmarshal into `EvalSpec`.
internal/jsonrpc/handlers.go	Updates JSON-RPC eval handling to use `EvalSpec` / `EvalConfig`.
internal/graders/trigger_grader_test.go	Updates trigger grader tests for `Type()` and `TaskSpec.Inputs`.
internal/graders/trigger_grader.go	Updates trigger grader to implement `Type()` and read prompt from `Inputs`.
internal/graders/tool_constraint_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/tool_constraint_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/text_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/text_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/skill_invocation_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/skill_invocation_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/run.go	Updates runner entrypoint to accept `TaskSpec` and task-level `Graders`.
internal/graders/prompt_grader.go	Updates prompt grader interface to `Type()` and result typing to `GraderType`.
internal/graders/program_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/program_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/json_schema_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/json_schema_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/inline_script_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/inline_script_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/grader.go	Updates the grader interface (`Type()`) and context to reference `TaskSpec`.
internal/graders/file_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/file_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/diff_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/behavior_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/behavior_grader.go	Updates grader interface implementation to `Type()`.
internal/graders/action_sequence_grader_test.go	Updates tests from `Kind()` to `Type()`.
internal/graders/action_sequence_grader.go	Updates grader interface implementation to `Type()`.
internal/config/config_test.go	Updates config tests to pass `EvalSpec`.
internal/config/config.go	Updates BenchmarkConfig to store/return `*EvalSpec`.
internal/cache/cache_test.go	Updates cache tests to use `EvalSpec` and `TaskSpec`.
internal/cache/cache.go	Updates cache key computation to accept `EvalSpec` + `TaskSpec` and read resources from `Inputs`.
cmd/waza/newtask/converters_test.go	Updates new-task converter tests for `TaskSpec` and task-level `Graders`.
cmd/waza/newtask/converters.go	Updates Copilot log → task converter to build `TaskSpec` with `Inputs` + `Graders`.
cmd/waza/cmd_run_suggest_test.go	Updates suggest-related tests to use `EvalSpec` / `EvalConfig`.
cmd/waza/cmd_run_suggest.go	Propagates `EvalSpec` through suggestion/report generation helpers and task loading.
cmd/waza/cmd_run.go	Updates single-model execution path to accept `*EvalSpec`.
cmd/waza/cmd_new_task_test.go	Updates end-to-end new-task test expectations to `TaskSpec`/`Graders`.
cmd/waza/cmd_grade.go	Updates grading helpers to accept `EvalSpec` / `TaskSpec`.

You can also share your feedback on Copilot code review. Take the survey.

codecov-commenter · 2026-03-13T02:19:50Z

Codecov Report

❌ Patch coverage is 78.44311% with 36 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@3068653). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/cmd_run_suggest.go	66.66%	7 Missing and 2 partials ⚠️
internal/orchestration/runner.go	82.69%	8 Missing and 1 partial ⚠️
internal/jsonrpc/handlers.go	37.50%	4 Missing and 1 partial ⚠️
internal/graders/run.go	0.00%	4 Missing ⚠️
internal/graders/prompt_grader.go	0.00%	3 Missing ⚠️
cmd/waza/cmd_grade.go	88.88%	0 Missing and 1 partial ⚠️
cmd/waza/cmd_new_task.go	50.00%	0 Missing and 1 partial ⚠️
cmd/waza/newtask/converters.go	90.00%	1 Missing ⚠️
internal/graders/diff_grader.go	0.00%	1 Missing ⚠️
internal/models/spec.go	80.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #128   +/-   ##
=======================================
  Coverage        ?   73.51%           
=======================================
  Files           ?      138           
  Lines           ?    15785           
  Branches        ?        0           
=======================================
  Hits            ?    11605           
  Misses          ?     3338           
  Partials        ?      842

Flag	Coverage Δ
go-implementation	`73.51% <78.44%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR performs a broad mechanical refactor/rename across the evaluation pipeline, updating core model types (specs, task/test definitions, and graders) and propagating those renames through orchestration, graders, config, cache, CLI, and related tests.

Changes:

Rename core models: BenchmarkSpec→EvalSpec, Config→EvalConfig, TestCase→TaskSpec, TestStimulus→TaskInputs, inline task validators→Grader with Kind→Type.
Update orchestration/runner, graders, caching, JSON-RPC handlers, and CLI code to use the new types/fields.
Add/update tests to cover renamed structures (including a new test for should_trigger decoding).

Reviewed changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
internal/trigger/runner_test.go	Update trigger runner tests to use `EvalSpec`.
internal/transcript/transcript_test.go	Update transcript tests to use `TaskSpec`/`TaskInputs`.
internal/transcript/transcript.go	Accept `*TaskSpec` and read `Inputs.Message` for transcripts.
internal/suggest/suggest.go	Unmarshal eval YAML into `EvalSpec` instead of `BenchmarkSpec`.
internal/orchestration/runner_test.go	Update orchestration runner tests for `EvalSpec`/`TaskSpec`.
internal/orchestration/runner_orchestration_test.go	Update orchestration integration tests for renamed grader/task structures.
internal/orchestration/runner.go	Migrate task loading/execution path to `TaskSpec` and `Inputs`.
internal/orchestration/filter_test.go	Update filtering tests to operate on `[]*TaskSpec`.
internal/orchestration/filter.go	Update filter API to operate on `[]*TaskSpec`.
internal/orchestration/csv_integration_test.go	Update CSV task generation tests to validate `Inputs.Message`.
internal/orchestration/baseline_test.go	Update baseline orchestration tests to use `EvalSpec`/`EvalConfig`.
internal/models/taskspec_test.go	Add coverage for `should_trigger` decoding into `ExpectedTrigger`.
internal/models/taskspec.go	Rename task model types and inline graders; update YAML unmarshalling accordingly.
internal/models/spec.go	Rename spec/config types to `EvalSpec`/`EvalConfig` and update validation.
internal/models/outcome.go	Rename `GraderKind` type to `GraderType` (constants retained).
internal/models/grader_params_test.go	Update grader parameter decoding test to use `Graders`.
internal/models/grader_params.go	Update parameter decoding entrypoint to accept `GraderType`.
internal/models/baseline_test.go	Update baseline YAML parsing test to use `EvalSpec`.
internal/jsonrpc/handlers.go	Update eval get/validate handlers to use `EvalSpec` and `EvalConfig`.
internal/graders/trigger_grader_test.go	Update trigger grader tests for `Type()` and `TaskSpec.Inputs`.
internal/graders/trigger_grader.go	Rename grader interface method to `Type()` and update prompt access.
internal/graders/tool_constraint_grader_test.go	Update tool-constraint grader tests for `Type()`.
internal/graders/tool_constraint_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/text_grader_test.go	Update text grader tests for `Type()`.
internal/graders/text_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/skill_invocation_grader_test.go	Update skill invocation grader tests for `Type()`.
internal/graders/skill_invocation_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/run.go	Run task-level graders from `TaskSpec.Graders` and validate `Type`.
internal/graders/prompt_grader.go	Rename grader interface method to `Type()` and propagate into results.
internal/graders/program_grader_test.go	Update program grader tests for `Type()`.
internal/graders/program_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/json_schema_grader_test.go	Update JSON schema grader tests for `Type()`.
internal/graders/json_schema_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/inline_script_grader_test.go	Update inline-script grader tests for `Type()`.
internal/graders/inline_script_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/grader.go	Update grader interface to `Type()` and context to reference `*TaskSpec`.
internal/graders/file_grader_test.go	Update file grader tests for `Type()`.
internal/graders/file_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/diff_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/behavior_grader_test.go	Update behavior grader tests for `Type()`.
internal/graders/behavior_grader.go	Implement `Type()` instead of `Kind()`.
internal/graders/action_sequence_grader_test.go	Update action-sequence grader tests for `Type()`.
internal/graders/action_sequence_grader.go	Implement `Type()` instead of `Kind()`.
internal/config/config_test.go	Update config tests to use `EvalSpec`.
internal/config/config.go	Store `*EvalSpec` in `BenchmarkConfig` and update getter signature.
internal/cache/cache_test.go	Update cache tests for `EvalSpec`/`TaskSpec`/`Inputs`.
internal/cache/cache.go	Update cache key inputs to `EvalSpec`/`TaskSpec` and use `Inputs.Resources`.
cmd/waza/newtask/converters_test.go	Update task generation tests for `TaskSpec` and inline `Graders`.
cmd/waza/newtask/converters.go	Emit `*TaskSpec`, populate `Inputs.Message`, and append `Graders`.
cmd/waza/cmd_run_suggest_test.go	Update suggest tests to use `EvalSpec`/`EvalConfig`.
cmd/waza/cmd_run_suggest.go	Update suggest pipeline to accept `EvalSpec` and load `[]TaskSpec`.
cmd/waza/cmd_run.go	Update run path to accept `*EvalSpec` in `runSingleModel`.
cmd/waza/cmd_new_task_test.go	Update end-to-end task generation test expected `TaskSpec` shape.
cmd/waza/cmd_grade.go	Update grading path to operate on `EvalSpec` and `TaskSpec`.

Comments suppressed due to low confidence (1)

internal/models/taskspec.go:17

TaskSpec.Inputs was renamed from Stimulus, but its JSON tag is still json:"stimulus". This makes the JSON representation inconsistent with the field name and the task schema (which uses inputs), and is likely an accidental leftover from the mechanical rename. Consider changing the JSON tag to json:"inputs" (or removing the JSON tag if TaskSpec is not meant to be JSON-serialized).

You can also share your feedback on Copilot code review. Take the survey.

…ark-msft/waza into wz-bug-cant-find-skills

Copilot

Pull request overview

Bulk mechanical rename sweep aligning core evaluation/task model naming across the Go codebase (EvalSpec/EvalConfig, TaskSpec/TaskInputs, Grader.Type) and updating all call sites accordingly.

Changes:

Renamed BenchmarkSpec → EvalSpec and Config → EvalConfig across config loading/validation, orchestration, CLI, and JSON-RPC handlers.
Renamed task model TestCase → TaskSpec (with TestStimulus → TaskInputs) and updated YAML/loader functions (LoadTaskSpec, task filtering, CSV task generation).
Renamed grader APIs (Grader.Kind() → Grader.Type(), GraderKind → GraderType) and updated built-in graders + tests.

Reviewed changes

Copilot reviewed 62 out of 62 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
internal/trigger/runner_test.go	Updates runner tests to construct configs from `models.EvalSpec`.
internal/transcript/transcript_test.go	Updates transcript test to use `TaskSpec`/`TaskInputs`.
internal/transcript/transcript.go	Updates `BuildTaskTranscript` to accept `*models.TaskSpec` and read from `Inputs`.
internal/suggest/suggest_test.go	Updates test comment to reference `EvalSpec`.
internal/suggest/suggest.go	Validates suggested YAML by unmarshalling into `models.EvalSpec`.
internal/suggest/prompt.go	Updates prompt wording to require `EvalSpec` YAML.
internal/orchestration/runner_test.go	Updates orchestration runner tests to use `EvalSpec`, `EvalConfig`, `TaskSpec`.
internal/orchestration/runner_orchestration_test.go	Updates orchestration integration-style tests for renamed models and grader fields.
internal/orchestration/runner.go	Renames task loading/execution pipeline to `TaskSpec` terminology and updates request building.
internal/orchestration/filter_test.go	Updates filter tests for `FilterTaskSpecs` and `TaskSpec` helpers.
internal/orchestration/filter.go	Renames filter entrypoint to `FilterTaskSpecs` and updates argument types.
internal/orchestration/csv_integration_test.go	Updates CSV task-loading tests to `loadTaskSpecsFromCSV` and `EvalSpec`.
internal/orchestration/baseline_test.go	Updates baseline tests to construct `EvalSpec`/`EvalConfig`.
internal/models/taskspec_test.go	Updates loader test names and calls to `LoadTaskSpec`.
internal/models/taskspec.go	Renames task structs to `TaskSpec`/`TaskInputs` and inline validators to `Grader`.
internal/models/spec_test.go	Renames spec/task loader tests to new loader functions/types.
internal/models/spec.go	Renames spec structs/functions to `EvalSpec`/`EvalConfig`/`LoadEvalSpec`.
internal/models/outcome.go	Renames `GraderKind` type to `GraderType` and updates result fields accordingly.
internal/models/grader_params_test.go	Updates parameter decoding tests for `LoadEvalSpec`/`LoadTaskSpec` and `Graders`.
internal/models/grader_params.go	Updates grader-parameter decoding signature to accept `GraderType`.
internal/models/baseline_test.go	Updates baseline YAML serialization tests to use `EvalSpec`.
internal/jsonrpc/handlers.go	Updates JSON-RPC handlers to load `EvalSpec` and return updated config types.
internal/graders/trigger_grader_test.go	Updates trigger grader tests for `Type()` and `TaskSpec` context.
internal/graders/trigger_grader.go	Updates trigger grader interface to `Type()` and reads prompt from `TaskSpec.Inputs`.
internal/graders/tool_constraint_grader_test.go	Updates tool-constraint grader tests for `Type()`.
internal/graders/tool_constraint_grader.go	Updates tool-constraint grader interface to `Type()`.
internal/graders/text_grader_test.go	Updates text grader tests for `Type()`.
internal/graders/text_grader.go	Updates text grader interface to `Type()`.
internal/graders/skill_invocation_grader_test.go	Updates skill-invocation grader tests for `Type()`.
internal/graders/skill_invocation_grader.go	Updates skill-invocation grader interface to `Type()`.
internal/graders/run.go	Updates `RunAll` signature to accept `*models.TaskSpec` and uses `tc.Graders`.
internal/graders/prompt_grader_test.go	Updates prompt grader test to load `EvalSpec`.
internal/graders/prompt_grader.go	Updates prompt grader interface to `Type()` and result typing.
internal/graders/program_grader_test.go	Updates program grader tests for `Type()`.
internal/graders/program_grader.go	Updates program grader interface to `Type()`.
internal/graders/json_schema_grader_test.go	Updates JSON-schema grader tests for `Type()`.
internal/graders/json_schema_grader.go	Updates JSON-schema grader interface to `Type()`.
internal/graders/inline_script_grader_test.go	Updates inline-script grader tests for `Type()`.
internal/graders/inline_script_grader.go	Updates inline-script grader interface to `Type()`.
internal/graders/grader.go	Updates grader interface (`Type`) and grading context (`TaskSpec`).
internal/graders/file_grader_test.go	Updates file grader tests for `Type()`.
internal/graders/file_grader.go	Updates file grader interface to `Type()`.
internal/graders/diff_grader.go	Updates diff grader interface to `Type()`.
internal/graders/behavior_grader_test.go	Updates behavior grader tests for `Type()`.
internal/graders/behavior_grader.go	Updates behavior grader interface to `Type()`.
internal/graders/action_sequence_grader_test.go	Updates action-sequence grader tests for `Type()`.
internal/graders/action_sequence_grader.go	Updates action-sequence grader interface to `Type()`.
internal/execution/copilot_test.go	Renames local test table variable from `testCases` to `taskSpecs`.
internal/config/config_test.go	Updates config tests to pass `*models.EvalSpec`.
internal/config/config.go	Updates `BenchmarkConfig` to hold `*models.EvalSpec` and adjusts getter types.
internal/cache/cache_test.go	Updates cache tests to use `EvalSpec`/`TaskSpec`/`TaskInputs`.
internal/cache/cache.go	Updates cache key inputs and fixture enumeration to use `TaskSpec.Inputs.Resources`.
cmd/waza/newtask/converters_test.go	Updates converter test to expect `TaskSpec` and inline `Graders` using `Type`.
cmd/waza/newtask/converters.go	Renames converter API to `CreateTaskSpecFromCopilotLog` and updates produced model fields.
cmd/waza/cmd_run_suggest_test.go	Updates suggest tests to construct `EvalSpec`/`EvalConfig`.
cmd/waza/cmd_run_suggest.go	Updates suggest pipeline to accept `*models.EvalSpec` and load `TaskSpec`s.
cmd/waza/cmd_run.go	Updates run command to load `EvalSpec` and pass it through execution.
cmd/waza/cmd_new_task_test.go	Updates new-task e2e test to load/compare `TaskSpec` with `Graders`.
cmd/waza/cmd_new_task.go	Updates new-task generation pipeline to use `CreateTaskSpecFromCopilotLog`.
cmd/waza/cmd_grade.go	Updates grade command to load `EvalSpec` and grade `TaskSpec` runs.
README.md	Updates internal/models documentation line to refer to `EvalSpec`/`TaskSpec`.
AGENTS.md	Updates architecture notes and naming table to reflect new model names.

Comments suppressed due to low confidence (3)

internal/models/spec_test.go:83

Several test function names look mangled by the mechanical rename (e.g., TestBenchmarkEvaltsDeserialization). Consider renaming these to clear TestEvalSpec_...-style names so test intent is obvious and consistent.
internal/models/taskspec.go:17
TaskSpec.Inputs still has the JSON tag json:"stimulus". Since task.get returns TaskSpec as JSON (see JSON-RPC handler), this will emit stimulus instead of inputs and is inconsistent with the YAML/schema. Update the JSON tag to json:"inputs" (and consider omitempty if appropriate).
internal/orchestration/runner.go:631
The comment and error message here still refer to "test cases" even though this function loads TaskSpecs. Update wording (and the error message) to "tasks"/"task specs" to match the new naming.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

This PR performs a broad mechanical rename across the Go codebase to align terminology around “evals”, “tasks”, and “graders”, including updating loaders, runners, JSON-RPC handlers, and tests.

Changes:

Rename core models/types: BenchmarkSpec→EvalSpec, Config→EvalConfig, TestCase→TaskSpec, TestStimulus→TaskInputs, ValidatorInline→Grader, Grader.Kind→Grader.Type.
Update orchestration, graders, cache, transcript, JSON-RPC handlers, and CLI paths to use the renamed types/APIs.
Rename and adjust tests/docs to match the new naming.

Reviewed changes

Copilot reviewed 62 out of 62 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
internal/trigger/runner_test.go	Updates runner tests to construct `EvalSpec`.
internal/transcript/transcript_test.go	Updates transcript tests to use `TaskSpec`/`TaskInputs`.
internal/transcript/transcript.go	Updates transcript builder signature/field access for `TaskSpec`.
internal/suggest/suggest_test.go	Updates comment/expectations to refer to `EvalSpec`.
internal/suggest/suggest.go	Updates YAML validation to unmarshal into `EvalSpec`.
internal/suggest/prompt.go	Updates generated prompt text to reference `EvalSpec`.
internal/orchestration/runner_test.go	Updates orchestration runner tests for `EvalSpec`/`TaskSpec`.
internal/orchestration/runner_orchestration_test.go	Updates orchestration tests for `EvalSpec`/task graders naming.
internal/orchestration/runner.go	Renames task loading/filtering/execution plumbing to `TaskSpec`.
internal/orchestration/filter_test.go	Renames filter tests + helpers to `FilterTaskSpecs`.
internal/orchestration/filter.go	Renames filter API to operate on `TaskSpec`.
internal/orchestration/csv_integration_test.go	Renames CSV task generation tests to `TaskSpec`.
internal/orchestration/baseline_test.go	Updates baseline tests to use `EvalSpec`/`EvalConfig`.
internal/models/taskspec_test.go	Renames loader tests to `LoadTaskSpec`.
internal/models/taskspec.go	Introduces `TaskSpec`/`TaskInputs`/`Grader` rename and loader rename.
internal/models/spec_test.go	Renames spec loader tests to `LoadEvalSpec` and task loader tests to `LoadTaskSpec`.
internal/models/spec.go	Renames spec model/loader to `EvalSpec`/`LoadEvalSpec` and config type to `EvalConfig`.
internal/models/outcome.go	Renames `GraderKind` type to `GraderType` (constants retained).
internal/models/grader_params_test.go	Updates polymorphic grading parameter tests for `EvalSpec`/`TaskSpec`.
internal/models/grader_params.go	Updates parameter decoding to accept `GraderType`.
internal/models/baseline_test.go	Updates baseline serialization tests to `EvalSpec`.
internal/jsonrpc/handlers.go	Updates eval/task JSON-RPC handlers to load `EvalSpec`/`TaskSpec`.
internal/graders/trigger_grader_test.go	Updates trigger grader tests to `Type()` + `TaskSpec` context.
internal/graders/trigger_grader.go	Renames grader method to `Type()` and switches to `TaskSpec` in context.
internal/graders/tool_constraint_grader_test.go	Updates tests to assert `Type()` instead of `Kind()`.
internal/graders/tool_constraint_grader.go	Renames grader method to `Type()`.
internal/graders/text_grader_test.go	Updates tests to assert `Type()`.
internal/graders/text_grader.go	Renames grader method to `Type()`.
internal/graders/skill_invocation_grader_test.go	Updates tests to assert `Type()`.
internal/graders/skill_invocation_grader.go	Renames grader method to `Type()`.
internal/graders/run.go	Updates runner to accept `TaskSpec` and iterate `TaskSpec.Graders`.
internal/graders/prompt_grader_test.go	Updates spec loader to `LoadEvalSpec`.
internal/graders/prompt_grader.go	Renames grader method to `Type()` and updates result construction.
internal/graders/program_grader_test.go	Updates tests to assert `Type()`.
internal/graders/program_grader.go	Renames grader method to `Type()`.
internal/graders/json_schema_grader_test.go	Updates tests to assert `Type()`.
internal/graders/json_schema_grader.go	Renames grader method to `Type()`.
internal/graders/inline_script_grader_test.go	Updates tests to assert `Type()`.
internal/graders/inline_script_grader.go	Renames grader method to `Type()`.
internal/graders/grader.go	Renames interface method to `Type()` and context field to `TaskSpec`.
internal/graders/file_grader_test.go	Updates tests to assert `Type()`.
internal/graders/file_grader.go	Renames grader method to `Type()`.
internal/graders/diff_grader.go	Renames grader method to `Type()`.
internal/graders/behavior_grader_test.go	Updates tests to assert `Type()`.
internal/graders/behavior_grader.go	Renames grader method to `Type()`.
internal/graders/action_sequence_grader_test.go	Updates tests to assert `Type()`.
internal/graders/action_sequence_grader.go	Renames grader method to `Type()`.
internal/execution/copilot_test.go	Renames local vars in tests from test-case terminology to task terminology.
internal/config/config_test.go	Updates config tests to build configs with `EvalSpec`.
internal/config/config.go	Updates `BenchmarkConfig` to store an `*EvalSpec`.
internal/cache/cache_test.go	Updates cache tests to use `EvalSpec`/`TaskSpec`.
internal/cache/cache.go	Updates cache key computation to accept `EvalSpec`/`TaskSpec`.
cmd/waza/newtask/converters_test.go	Renames converter tests to `CreateTaskSpecFromCopilotLog`.
cmd/waza/newtask/converters.go	Renames converter API to produce `TaskSpec` with task-level graders.
cmd/waza/cmd_run_suggest_test.go	Updates suggest tests to use `EvalSpec`/`EvalConfig`.
cmd/waza/cmd_run_suggest.go	Updates suggest plumbing to accept `*EvalSpec` and load `TaskSpec`s.
cmd/waza/cmd_run.go	Updates eval runner to load `EvalSpec` and pass it through.
cmd/waza/cmd_new_task_test.go	Updates end-to-end new-task tests to load `TaskSpec`.
cmd/waza/cmd_new_task.go	Updates command implementation to use the renamed newtask converter API.
cmd/waza/cmd_grade.go	Updates grading command to load `EvalSpec` and grade `TaskSpec` runs.
README.md	Updates repository structure docs to reflect new type names.
AGENTS.md	Updates architecture docs to reflect new filenames/type names.

Comments suppressed due to low confidence (3)

internal/models/taskspec.go:16

TaskSpec.Inputs is now the canonical field name, but the JSON tag is still json:"stimulus". Since task.get returns *models.TaskSpec directly, this makes the JSON-RPC payload inconsistent with the rename (and likely with the YAML/schema key inputs). Consider updating the JSON tag to inputs (or, if backward compatibility is required, return a separate DTO from the handler to keep the old field name).

This issue also appears on line 62 of the same file.
internal/models/taskspec.go:66

Grader.Identifier is still the Go field name used throughout the codebase, but its JSON tag was changed to json:"name". Because task.get returns the TaskSpec directly, this is an API breaking change and also inconsistent with models.GraderConfig.Identifier (which serializes as identifier) and models.GraderResults.Name (which serializes as identifier). Consider keeping the JSON tag as identifier (or rename the field to Name everywhere) to avoid surprising RPC consumers.
cmd/waza/cmd_run_suggest.go:518
This error message still says "failed to load test case" even though loadTaskSpecsFromFiles loads TaskSpecs. Renaming it to "failed to load task spec" (or "task") would keep terminology consistent with the rest of the renamed code.

	for _, path := range testFiles {
		tc, err := models.LoadTaskSpec(path)
		if err != nil {
			return nil, fmt.Errorf("failed to load test case %s: %w", path, err)
		}

You can also share your feedback on Copilot code review. Take the survey.

+		tc, err := models.LoadTaskSpec(path)
 		if err != nil {
 			return nil, fmt.Errorf("failed to load test case %s: %w", path, err)
 		}


+		yml, err := yaml.Marshal(taskSpec)
 		if err != nil {
-			return nil, fmt.Errorf("marshaling test case %s: %w", tc.TestID, err)
+			return nil, fmt.Errorf("marshaling test case %s: %w", taskSpec.TestID, err)
 		}


 // CacheKey generates a unique cache key for a test case run
 // The key is based on:
 // - spec content (name, config, graders)
 // - task content (test case definition)
 // - model ID
 // - fixture file hashes
-func CacheKey(spec *models.BenchmarkSpec, task *models.TestCase, fixtureDir string) (string, error) {
+func CacheKey(spec *models.EvalSpec, task *models.TaskSpec, fixtureDir string) (string, error) {


 // If taskPatterns and tagPatterns are specified the result is the intersection of the matches between them.
 // If both taskPatterns and tagPatterns are empty, all test cases are returned.
-func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error) {
+func FilterTaskSpecs(taskSpecs []*models.TaskSpec, taskPatterns []string, tagPatterns []string) ([]*models.TaskSpec, error) {
 	if len(taskPatterns) == 0 && len(tagPatterns) == 0 {
-		return testCases, nil
+		return taskSpecs, nil
 	}

-	var matched []*models.TestCase
+	var matched []*models.TaskSpec

-	for _, tc := range testCases {
-		taskNameMatch, err := matchesTaskOrDisplayName(tc, taskPatterns)
+	for _, taskSpec := range taskSpecs {
+		taskNameMatch, err := matchesTaskOrDisplayName(taskSpec, taskPatterns)

 		if err != nil {
 			return nil, err
 		}

-		tagNameMatch, err := matchesTags(tc, tagPatterns)
+		tagNameMatch, err := matchesTags(taskSpec, tagPatterns)

 		if err != nil {
 			return nil, err
 		}

 		if taskNameMatch && tagNameMatch {
-			matched = append(matched, tc)
+			matched = append(matched, taskSpec)
 		}
 	}

 	return matched, nil
 }

 // matchesTaskOrDisplayName reports whether a test case's DisplayName or TestID matches any pattern.
-func matchesTaskOrDisplayName(tc *models.TestCase, patterns []string) (bool, error) {
+func matchesTaskOrDisplayName(tc *models.TaskSpec, patterns []string) (bool, error) {


richardpark-msft · 2026-03-17T16:45:30Z

Going to take a run at this with copilot and just not bother trying to rebase/merge :)

Richard Park added 3 commits March 13, 2026 02:05

Renames only (BenchmarkSpec and Config to EvalSpec and EvalConfig)

2c7a917

TaskSpec renames

b7500c6

- TestCase to TaskSpec - TestStimulus -> TaskInputs - ValidatorInline -> Grader

Grader.Kind -> Grader.Type

8dea91c

richardpark-msft requested review from chlowell, spboyer and wbreza as code owners March 13, 2026 02:12

Copilot AI review requested due to automatic review settings March 13, 2026 02:12

github-actions Bot enabled auto-merge (squash) March 13, 2026 02:12

Copilot started reviewing on behalf of richardpark-msft March 13, 2026 02:12 View session

gofmt

d4078bc

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Comment thread internal/jsonrpc/handlers.go

Comment thread internal/graders/run.go Outdated

Comment thread internal/models/baseline_test.go

Comment thread internal/models/grader_params_test.go

Potential fix for pull request finding

2d53a2b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 13, 2026 02:22

Copilot started reviewing on behalf of richardpark-msft March 13, 2026 02:23 View session

Richard Park added 3 commits March 13, 2026 02:23

Fixing a test to match the name of the spec type.

f56ec84

Fix another function name.

bf632cb

Eradicating BenchmarkSpec name in favor of EvalSpec

966f335

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Comment thread internal/orchestration/runner_orchestration_test.go Outdated

Richard Park added 3 commits March 13, 2026 02:32

Some more renames, mostly just F2 renames, and some search and replace.

08cd163

Merge branch 'wz-bug-cant-find-skills' of https://github.com/richardp…

87c86fa

…ark-msft/waza into wz-bug-cant-find-skills

Fixing a test description (cosmetic)

4efd5d4

Copilot AI review requested due to automatic review settings March 13, 2026 02:34

Copilot started reviewing on behalf of richardpark-msft March 13, 2026 02:35 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Comment thread internal/models/spec_test.go Outdated

Comment thread internal/orchestration/runner.go Outdated

Comment thread cmd/waza/cmd_new_task.go Outdated

Richard Park added 3 commits March 13, 2026 02:48

Oops on those.

fdf2fea

More comments/strings

e312ee6

Singular, not plural taskspecs :)

626cc03

Copilot AI review requested due to automatic review settings March 13, 2026 02:50

Copilot started reviewing on behalf of richardpark-msft March 13, 2026 02:51 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

richardpark-msft closed this Mar 17, 2026

auto-merge was automatically disabled March 17, 2026 16:45
Pull request was closed

richardpark-msft deleted the wz-bug-cant-find-skills branch March 17, 2026 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Doing bulk mechanical renames using Go refactor/rename#128

Doing bulk mechanical renames using Go refactor/rename#128
richardpark-msft wants to merge 14 commits into
microsoft:mainfrom
richardpark-msft:wz-bug-cant-find-skills

richardpark-msft commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

richardpark-msft commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

richardpark-msft commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

richardpark-msft commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

richardpark-msft commented Mar 13, 2026 •

edited

Loading

codecov-commenter commented Mar 13, 2026 •

edited

Loading