Doing bulk mechanical renames using Go refactor/rename#128
Doing bulk mechanical renames using Go refactor/rename#128richardpark-msft wants to merge 14 commits into
Conversation
- TestCase to TaskSpec - TestStimulus -> TaskInputs - ValidatorInline -> Grader
There was a problem hiding this comment.
Pull request overview
This PR applies a large set of mechanical renames across the evaluation (“benchmark”) pipeline, updating core model types and propagating those changes through orchestration, graders, caching, config, CLI, JSON-RPC handlers, and tests.
Changes:
- Renames
models.BenchmarkSpec→models.EvalSpecandmodels.Config→models.EvalConfigacross runtime code and tests. - Renames task-level structures (
TestCase→TaskSpec,TestStimulus→TaskInputs,ValidatorInline→Grader) and updates call sites. - Renames grader identity/type plumbing (
GraderKind→GraderType,Grader.Kind()→Grader.Type()), updating all grader implementations and tests.
Reviewed changes
Copilot reviewed 53 out of 54 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/trigger/runner_test.go | Updates tests to use EvalSpec instead of BenchmarkSpec. |
| internal/transcript/transcript_test.go | Updates transcript test fixtures to TaskSpec / TaskInputs. |
| internal/transcript/transcript.go | Updates BuildTaskTranscript to accept TaskSpec and read from Inputs. |
| internal/suggest/suggest.go | Updates YAML validation to unmarshal into EvalSpec. |
| internal/orchestration/runner_test.go | Updates runner tests for EvalSpec / EvalConfig and task structs. |
| internal/orchestration/runner_orchestration_test.go | Updates orchestration tests for renamed spec/task/grader fields and Type(). |
| internal/orchestration/runner.go | Converts runner task loading/execution/grading flow to TaskSpec / TaskInputs. |
| internal/orchestration/filter_test.go | Updates filter tests to operate on TaskSpec. |
| internal/orchestration/filter.go | Updates filter APIs to accept/return TaskSpec. |
| internal/orchestration/csv_integration_test.go | Updates CSV task generation tests to EvalSpec and TaskSpec.Inputs. |
| internal/orchestration/baseline_test.go | Updates baseline tests to use EvalSpec / EvalConfig. |
| internal/models/taskspec_test.go | Adds coverage for should_trigger YAML → ExpectedTrigger pointer behavior. |
| internal/models/taskspec.go | Renames task model types; updates inline grader struct to Grader with Type. |
| internal/models/spec.go | Renames spec/config types to EvalSpec / EvalConfig and updates grader kind type. |
| internal/models/outcome.go | Renames GraderKind → GraderType and updates results typing. |
| internal/models/grader_params_test.go | Updates polymorphic parameter tests to read task-level graders via tc.Graders. |
| internal/models/grader_params.go | Updates parameter decoding to accept GraderType. |
| internal/models/baseline_test.go | Updates baseline YAML test to unmarshal into EvalSpec. |
| internal/jsonrpc/handlers.go | Updates JSON-RPC eval handling to use EvalSpec / EvalConfig. |
| internal/graders/trigger_grader_test.go | Updates trigger grader tests for Type() and TaskSpec.Inputs. |
| internal/graders/trigger_grader.go | Updates trigger grader to implement Type() and read prompt from Inputs. |
| internal/graders/tool_constraint_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/tool_constraint_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/text_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/text_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/skill_invocation_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/skill_invocation_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/run.go | Updates runner entrypoint to accept TaskSpec and task-level Graders. |
| internal/graders/prompt_grader.go | Updates prompt grader interface to Type() and result typing to GraderType. |
| internal/graders/program_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/program_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/json_schema_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/json_schema_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/inline_script_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/inline_script_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/grader.go | Updates the grader interface (Type()) and context to reference TaskSpec. |
| internal/graders/file_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/file_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/diff_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/behavior_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/behavior_grader.go | Updates grader interface implementation to Type(). |
| internal/graders/action_sequence_grader_test.go | Updates tests from Kind() to Type(). |
| internal/graders/action_sequence_grader.go | Updates grader interface implementation to Type(). |
| internal/config/config_test.go | Updates config tests to pass EvalSpec. |
| internal/config/config.go | Updates BenchmarkConfig to store/return *EvalSpec. |
| internal/cache/cache_test.go | Updates cache tests to use EvalSpec and TaskSpec. |
| internal/cache/cache.go | Updates cache key computation to accept EvalSpec + TaskSpec and read resources from Inputs. |
| cmd/waza/newtask/converters_test.go | Updates new-task converter tests for TaskSpec and task-level Graders. |
| cmd/waza/newtask/converters.go | Updates Copilot log → task converter to build TaskSpec with Inputs + Graders. |
| cmd/waza/cmd_run_suggest_test.go | Updates suggest-related tests to use EvalSpec / EvalConfig. |
| cmd/waza/cmd_run_suggest.go | Propagates EvalSpec through suggestion/report generation helpers and task loading. |
| cmd/waza/cmd_run.go | Updates single-model execution path to accept *EvalSpec. |
| cmd/waza/cmd_new_task_test.go | Updates end-to-end new-task test expectations to TaskSpec/Graders. |
| cmd/waza/cmd_grade.go | Updates grading helpers to accept EvalSpec / TaskSpec. |
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #128 +/- ##
=======================================
Coverage ? 73.51%
=======================================
Files ? 138
Lines ? 15785
Branches ? 0
=======================================
Hits ? 11605
Misses ? 3338
Partials ? 842
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR performs a broad mechanical refactor/rename across the evaluation pipeline, updating core model types (specs, task/test definitions, and graders) and propagating those renames through orchestration, graders, config, cache, CLI, and related tests.
Changes:
- Rename core models:
BenchmarkSpec→EvalSpec,Config→EvalConfig,TestCase→TaskSpec,TestStimulus→TaskInputs, inline task validators→GraderwithKind→Type. - Update orchestration/runner, graders, caching, JSON-RPC handlers, and CLI code to use the new types/fields.
- Add/update tests to cover renamed structures (including a new test for
should_triggerdecoding).
Reviewed changes
Copilot reviewed 53 out of 54 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| internal/trigger/runner_test.go | Update trigger runner tests to use EvalSpec. |
| internal/transcript/transcript_test.go | Update transcript tests to use TaskSpec/TaskInputs. |
| internal/transcript/transcript.go | Accept *TaskSpec and read Inputs.Message for transcripts. |
| internal/suggest/suggest.go | Unmarshal eval YAML into EvalSpec instead of BenchmarkSpec. |
| internal/orchestration/runner_test.go | Update orchestration runner tests for EvalSpec/TaskSpec. |
| internal/orchestration/runner_orchestration_test.go | Update orchestration integration tests for renamed grader/task structures. |
| internal/orchestration/runner.go | Migrate task loading/execution path to TaskSpec and Inputs. |
| internal/orchestration/filter_test.go | Update filtering tests to operate on []*TaskSpec. |
| internal/orchestration/filter.go | Update filter API to operate on []*TaskSpec. |
| internal/orchestration/csv_integration_test.go | Update CSV task generation tests to validate Inputs.Message. |
| internal/orchestration/baseline_test.go | Update baseline orchestration tests to use EvalSpec/EvalConfig. |
| internal/models/taskspec_test.go | Add coverage for should_trigger decoding into ExpectedTrigger. |
| internal/models/taskspec.go | Rename task model types and inline graders; update YAML unmarshalling accordingly. |
| internal/models/spec.go | Rename spec/config types to EvalSpec/EvalConfig and update validation. |
| internal/models/outcome.go | Rename GraderKind type to GraderType (constants retained). |
| internal/models/grader_params_test.go | Update grader parameter decoding test to use Graders. |
| internal/models/grader_params.go | Update parameter decoding entrypoint to accept GraderType. |
| internal/models/baseline_test.go | Update baseline YAML parsing test to use EvalSpec. |
| internal/jsonrpc/handlers.go | Update eval get/validate handlers to use EvalSpec and EvalConfig. |
| internal/graders/trigger_grader_test.go | Update trigger grader tests for Type() and TaskSpec.Inputs. |
| internal/graders/trigger_grader.go | Rename grader interface method to Type() and update prompt access. |
| internal/graders/tool_constraint_grader_test.go | Update tool-constraint grader tests for Type(). |
| internal/graders/tool_constraint_grader.go | Implement Type() instead of Kind(). |
| internal/graders/text_grader_test.go | Update text grader tests for Type(). |
| internal/graders/text_grader.go | Implement Type() instead of Kind(). |
| internal/graders/skill_invocation_grader_test.go | Update skill invocation grader tests for Type(). |
| internal/graders/skill_invocation_grader.go | Implement Type() instead of Kind(). |
| internal/graders/run.go | Run task-level graders from TaskSpec.Graders and validate Type. |
| internal/graders/prompt_grader.go | Rename grader interface method to Type() and propagate into results. |
| internal/graders/program_grader_test.go | Update program grader tests for Type(). |
| internal/graders/program_grader.go | Implement Type() instead of Kind(). |
| internal/graders/json_schema_grader_test.go | Update JSON schema grader tests for Type(). |
| internal/graders/json_schema_grader.go | Implement Type() instead of Kind(). |
| internal/graders/inline_script_grader_test.go | Update inline-script grader tests for Type(). |
| internal/graders/inline_script_grader.go | Implement Type() instead of Kind(). |
| internal/graders/grader.go | Update grader interface to Type() and context to reference *TaskSpec. |
| internal/graders/file_grader_test.go | Update file grader tests for Type(). |
| internal/graders/file_grader.go | Implement Type() instead of Kind(). |
| internal/graders/diff_grader.go | Implement Type() instead of Kind(). |
| internal/graders/behavior_grader_test.go | Update behavior grader tests for Type(). |
| internal/graders/behavior_grader.go | Implement Type() instead of Kind(). |
| internal/graders/action_sequence_grader_test.go | Update action-sequence grader tests for Type(). |
| internal/graders/action_sequence_grader.go | Implement Type() instead of Kind(). |
| internal/config/config_test.go | Update config tests to use EvalSpec. |
| internal/config/config.go | Store *EvalSpec in BenchmarkConfig and update getter signature. |
| internal/cache/cache_test.go | Update cache tests for EvalSpec/TaskSpec/Inputs. |
| internal/cache/cache.go | Update cache key inputs to EvalSpec/TaskSpec and use Inputs.Resources. |
| cmd/waza/newtask/converters_test.go | Update task generation tests for TaskSpec and inline Graders. |
| cmd/waza/newtask/converters.go | Emit *TaskSpec, populate Inputs.Message, and append Graders. |
| cmd/waza/cmd_run_suggest_test.go | Update suggest tests to use EvalSpec/EvalConfig. |
| cmd/waza/cmd_run_suggest.go | Update suggest pipeline to accept *EvalSpec and load []*TaskSpec. |
| cmd/waza/cmd_run.go | Update run path to accept *EvalSpec in runSingleModel. |
| cmd/waza/cmd_new_task_test.go | Update end-to-end task generation test expected TaskSpec shape. |
| cmd/waza/cmd_grade.go | Update grading path to operate on *EvalSpec and *TaskSpec. |
Comments suppressed due to low confidence (1)
internal/models/taskspec.go:17
- TaskSpec.Inputs was renamed from Stimulus, but its JSON tag is still
json:"stimulus". This makes the JSON representation inconsistent with the field name and the task schema (which usesinputs), and is likely an accidental leftover from the mechanical rename. Consider changing the JSON tag tojson:"inputs"(or removing the JSON tag if TaskSpec is not meant to be JSON-serialized).
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Bulk mechanical rename sweep aligning core evaluation/task model naming across the Go codebase (EvalSpec/EvalConfig, TaskSpec/TaskInputs, Grader.Type) and updating all call sites accordingly.
Changes:
- Renamed
BenchmarkSpec→EvalSpecandConfig→EvalConfigacross config loading/validation, orchestration, CLI, and JSON-RPC handlers. - Renamed task model
TestCase→TaskSpec(withTestStimulus→TaskInputs) and updated YAML/loader functions (LoadTaskSpec, task filtering, CSV task generation). - Renamed grader APIs (
Grader.Kind()→Grader.Type(),GraderKind→GraderType) and updated built-in graders + tests.
Reviewed changes
Copilot reviewed 62 out of 62 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/trigger/runner_test.go | Updates runner tests to construct configs from models.EvalSpec. |
| internal/transcript/transcript_test.go | Updates transcript test to use TaskSpec/TaskInputs. |
| internal/transcript/transcript.go | Updates BuildTaskTranscript to accept *models.TaskSpec and read from Inputs. |
| internal/suggest/suggest_test.go | Updates test comment to reference EvalSpec. |
| internal/suggest/suggest.go | Validates suggested YAML by unmarshalling into models.EvalSpec. |
| internal/suggest/prompt.go | Updates prompt wording to require EvalSpec YAML. |
| internal/orchestration/runner_test.go | Updates orchestration runner tests to use EvalSpec, EvalConfig, TaskSpec. |
| internal/orchestration/runner_orchestration_test.go | Updates orchestration integration-style tests for renamed models and grader fields. |
| internal/orchestration/runner.go | Renames task loading/execution pipeline to TaskSpec terminology and updates request building. |
| internal/orchestration/filter_test.go | Updates filter tests for FilterTaskSpecs and TaskSpec helpers. |
| internal/orchestration/filter.go | Renames filter entrypoint to FilterTaskSpecs and updates argument types. |
| internal/orchestration/csv_integration_test.go | Updates CSV task-loading tests to loadTaskSpecsFromCSV and EvalSpec. |
| internal/orchestration/baseline_test.go | Updates baseline tests to construct EvalSpec/EvalConfig. |
| internal/models/taskspec_test.go | Updates loader test names and calls to LoadTaskSpec. |
| internal/models/taskspec.go | Renames task structs to TaskSpec/TaskInputs and inline validators to Grader. |
| internal/models/spec_test.go | Renames spec/task loader tests to new loader functions/types. |
| internal/models/spec.go | Renames spec structs/functions to EvalSpec/EvalConfig/LoadEvalSpec. |
| internal/models/outcome.go | Renames GraderKind type to GraderType and updates result fields accordingly. |
| internal/models/grader_params_test.go | Updates parameter decoding tests for LoadEvalSpec/LoadTaskSpec and Graders. |
| internal/models/grader_params.go | Updates grader-parameter decoding signature to accept GraderType. |
| internal/models/baseline_test.go | Updates baseline YAML serialization tests to use EvalSpec. |
| internal/jsonrpc/handlers.go | Updates JSON-RPC handlers to load EvalSpec and return updated config types. |
| internal/graders/trigger_grader_test.go | Updates trigger grader tests for Type() and TaskSpec context. |
| internal/graders/trigger_grader.go | Updates trigger grader interface to Type() and reads prompt from TaskSpec.Inputs. |
| internal/graders/tool_constraint_grader_test.go | Updates tool-constraint grader tests for Type(). |
| internal/graders/tool_constraint_grader.go | Updates tool-constraint grader interface to Type(). |
| internal/graders/text_grader_test.go | Updates text grader tests for Type(). |
| internal/graders/text_grader.go | Updates text grader interface to Type(). |
| internal/graders/skill_invocation_grader_test.go | Updates skill-invocation grader tests for Type(). |
| internal/graders/skill_invocation_grader.go | Updates skill-invocation grader interface to Type(). |
| internal/graders/run.go | Updates RunAll signature to accept *models.TaskSpec and uses tc.Graders. |
| internal/graders/prompt_grader_test.go | Updates prompt grader test to load EvalSpec. |
| internal/graders/prompt_grader.go | Updates prompt grader interface to Type() and result typing. |
| internal/graders/program_grader_test.go | Updates program grader tests for Type(). |
| internal/graders/program_grader.go | Updates program grader interface to Type(). |
| internal/graders/json_schema_grader_test.go | Updates JSON-schema grader tests for Type(). |
| internal/graders/json_schema_grader.go | Updates JSON-schema grader interface to Type(). |
| internal/graders/inline_script_grader_test.go | Updates inline-script grader tests for Type(). |
| internal/graders/inline_script_grader.go | Updates inline-script grader interface to Type(). |
| internal/graders/grader.go | Updates grader interface (Type) and grading context (TaskSpec). |
| internal/graders/file_grader_test.go | Updates file grader tests for Type(). |
| internal/graders/file_grader.go | Updates file grader interface to Type(). |
| internal/graders/diff_grader.go | Updates diff grader interface to Type(). |
| internal/graders/behavior_grader_test.go | Updates behavior grader tests for Type(). |
| internal/graders/behavior_grader.go | Updates behavior grader interface to Type(). |
| internal/graders/action_sequence_grader_test.go | Updates action-sequence grader tests for Type(). |
| internal/graders/action_sequence_grader.go | Updates action-sequence grader interface to Type(). |
| internal/execution/copilot_test.go | Renames local test table variable from testCases to taskSpecs. |
| internal/config/config_test.go | Updates config tests to pass *models.EvalSpec. |
| internal/config/config.go | Updates BenchmarkConfig to hold *models.EvalSpec and adjusts getter types. |
| internal/cache/cache_test.go | Updates cache tests to use EvalSpec/TaskSpec/TaskInputs. |
| internal/cache/cache.go | Updates cache key inputs and fixture enumeration to use TaskSpec.Inputs.Resources. |
| cmd/waza/newtask/converters_test.go | Updates converter test to expect TaskSpec and inline Graders using Type. |
| cmd/waza/newtask/converters.go | Renames converter API to CreateTaskSpecFromCopilotLog and updates produced model fields. |
| cmd/waza/cmd_run_suggest_test.go | Updates suggest tests to construct EvalSpec/EvalConfig. |
| cmd/waza/cmd_run_suggest.go | Updates suggest pipeline to accept *models.EvalSpec and load TaskSpecs. |
| cmd/waza/cmd_run.go | Updates run command to load EvalSpec and pass it through execution. |
| cmd/waza/cmd_new_task_test.go | Updates new-task e2e test to load/compare TaskSpec with Graders. |
| cmd/waza/cmd_new_task.go | Updates new-task generation pipeline to use CreateTaskSpecFromCopilotLog. |
| cmd/waza/cmd_grade.go | Updates grade command to load EvalSpec and grade TaskSpec runs. |
| README.md | Updates internal/models documentation line to refer to EvalSpec/TaskSpec. |
| AGENTS.md | Updates architecture notes and naming table to reflect new model names. |
Comments suppressed due to low confidence (3)
internal/models/spec_test.go:83
- Several test function names look mangled by the mechanical rename (e.g.,
TestBenchmarkEvaltsDeserialization). Consider renaming these to clearTestEvalSpec_...-style names so test intent is obvious and consistent.
internal/models/taskspec.go:17 TaskSpec.Inputsstill has the JSON tagjson:"stimulus". Sincetask.getreturnsTaskSpecas JSON (see JSON-RPC handler), this will emitstimulusinstead ofinputsand is inconsistent with the YAML/schema. Update the JSON tag tojson:"inputs"(and consideromitemptyif appropriate).
internal/orchestration/runner.go:631- The comment and error message here still refer to "test cases" even though this function loads
TaskSpecs. Update wording (and the error message) to "tasks"/"task specs" to match the new naming.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR performs a broad mechanical rename across the Go codebase to align terminology around “evals”, “tasks”, and “graders”, including updating loaders, runners, JSON-RPC handlers, and tests.
Changes:
- Rename core models/types:
BenchmarkSpec→EvalSpec,Config→EvalConfig,TestCase→TaskSpec,TestStimulus→TaskInputs,ValidatorInline→Grader,Grader.Kind→Grader.Type. - Update orchestration, graders, cache, transcript, JSON-RPC handlers, and CLI paths to use the renamed types/APIs.
- Rename and adjust tests/docs to match the new naming.
Reviewed changes
Copilot reviewed 62 out of 62 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/trigger/runner_test.go | Updates runner tests to construct EvalSpec. |
| internal/transcript/transcript_test.go | Updates transcript tests to use TaskSpec/TaskInputs. |
| internal/transcript/transcript.go | Updates transcript builder signature/field access for TaskSpec. |
| internal/suggest/suggest_test.go | Updates comment/expectations to refer to EvalSpec. |
| internal/suggest/suggest.go | Updates YAML validation to unmarshal into EvalSpec. |
| internal/suggest/prompt.go | Updates generated prompt text to reference EvalSpec. |
| internal/orchestration/runner_test.go | Updates orchestration runner tests for EvalSpec/TaskSpec. |
| internal/orchestration/runner_orchestration_test.go | Updates orchestration tests for EvalSpec/task graders naming. |
| internal/orchestration/runner.go | Renames task loading/filtering/execution plumbing to TaskSpec. |
| internal/orchestration/filter_test.go | Renames filter tests + helpers to FilterTaskSpecs. |
| internal/orchestration/filter.go | Renames filter API to operate on TaskSpec. |
| internal/orchestration/csv_integration_test.go | Renames CSV task generation tests to TaskSpec. |
| internal/orchestration/baseline_test.go | Updates baseline tests to use EvalSpec/EvalConfig. |
| internal/models/taskspec_test.go | Renames loader tests to LoadTaskSpec. |
| internal/models/taskspec.go | Introduces TaskSpec/TaskInputs/Grader rename and loader rename. |
| internal/models/spec_test.go | Renames spec loader tests to LoadEvalSpec and task loader tests to LoadTaskSpec. |
| internal/models/spec.go | Renames spec model/loader to EvalSpec/LoadEvalSpec and config type to EvalConfig. |
| internal/models/outcome.go | Renames GraderKind type to GraderType (constants retained). |
| internal/models/grader_params_test.go | Updates polymorphic grading parameter tests for EvalSpec/TaskSpec. |
| internal/models/grader_params.go | Updates parameter decoding to accept GraderType. |
| internal/models/baseline_test.go | Updates baseline serialization tests to EvalSpec. |
| internal/jsonrpc/handlers.go | Updates eval/task JSON-RPC handlers to load EvalSpec/TaskSpec. |
| internal/graders/trigger_grader_test.go | Updates trigger grader tests to Type() + TaskSpec context. |
| internal/graders/trigger_grader.go | Renames grader method to Type() and switches to TaskSpec in context. |
| internal/graders/tool_constraint_grader_test.go | Updates tests to assert Type() instead of Kind(). |
| internal/graders/tool_constraint_grader.go | Renames grader method to Type(). |
| internal/graders/text_grader_test.go | Updates tests to assert Type(). |
| internal/graders/text_grader.go | Renames grader method to Type(). |
| internal/graders/skill_invocation_grader_test.go | Updates tests to assert Type(). |
| internal/graders/skill_invocation_grader.go | Renames grader method to Type(). |
| internal/graders/run.go | Updates runner to accept TaskSpec and iterate TaskSpec.Graders. |
| internal/graders/prompt_grader_test.go | Updates spec loader to LoadEvalSpec. |
| internal/graders/prompt_grader.go | Renames grader method to Type() and updates result construction. |
| internal/graders/program_grader_test.go | Updates tests to assert Type(). |
| internal/graders/program_grader.go | Renames grader method to Type(). |
| internal/graders/json_schema_grader_test.go | Updates tests to assert Type(). |
| internal/graders/json_schema_grader.go | Renames grader method to Type(). |
| internal/graders/inline_script_grader_test.go | Updates tests to assert Type(). |
| internal/graders/inline_script_grader.go | Renames grader method to Type(). |
| internal/graders/grader.go | Renames interface method to Type() and context field to TaskSpec. |
| internal/graders/file_grader_test.go | Updates tests to assert Type(). |
| internal/graders/file_grader.go | Renames grader method to Type(). |
| internal/graders/diff_grader.go | Renames grader method to Type(). |
| internal/graders/behavior_grader_test.go | Updates tests to assert Type(). |
| internal/graders/behavior_grader.go | Renames grader method to Type(). |
| internal/graders/action_sequence_grader_test.go | Updates tests to assert Type(). |
| internal/graders/action_sequence_grader.go | Renames grader method to Type(). |
| internal/execution/copilot_test.go | Renames local vars in tests from test-case terminology to task terminology. |
| internal/config/config_test.go | Updates config tests to build configs with EvalSpec. |
| internal/config/config.go | Updates BenchmarkConfig to store an *EvalSpec. |
| internal/cache/cache_test.go | Updates cache tests to use EvalSpec/TaskSpec. |
| internal/cache/cache.go | Updates cache key computation to accept EvalSpec/TaskSpec. |
| cmd/waza/newtask/converters_test.go | Renames converter tests to CreateTaskSpecFromCopilotLog. |
| cmd/waza/newtask/converters.go | Renames converter API to produce TaskSpec with task-level graders. |
| cmd/waza/cmd_run_suggest_test.go | Updates suggest tests to use EvalSpec/EvalConfig. |
| cmd/waza/cmd_run_suggest.go | Updates suggest plumbing to accept *EvalSpec and load TaskSpecs. |
| cmd/waza/cmd_run.go | Updates eval runner to load EvalSpec and pass it through. |
| cmd/waza/cmd_new_task_test.go | Updates end-to-end new-task tests to load TaskSpec. |
| cmd/waza/cmd_new_task.go | Updates command implementation to use the renamed newtask converter API. |
| cmd/waza/cmd_grade.go | Updates grading command to load EvalSpec and grade TaskSpec runs. |
| README.md | Updates repository structure docs to reflect new type names. |
| AGENTS.md | Updates architecture docs to reflect new filenames/type names. |
Comments suppressed due to low confidence (3)
internal/models/taskspec.go:16
TaskSpec.Inputsis now the canonical field name, but the JSON tag is stilljson:"stimulus". Sincetask.getreturns*models.TaskSpecdirectly, this makes the JSON-RPC payload inconsistent with the rename (and likely with the YAML/schema keyinputs). Consider updating the JSON tag toinputs(or, if backward compatibility is required, return a separate DTO from the handler to keep the old field name).
This issue also appears on line 62 of the same file.
internal/models/taskspec.go:66
Grader.Identifieris still the Go field name used throughout the codebase, but its JSON tag was changed tojson:"name". Becausetask.getreturns the TaskSpec directly, this is an API breaking change and also inconsistent withmodels.GraderConfig.Identifier(which serializes asidentifier) andmodels.GraderResults.Name(which serializes asidentifier). Consider keeping the JSON tag asidentifier(or rename the field toNameeverywhere) to avoid surprising RPC consumers.
cmd/waza/cmd_run_suggest.go:518- This error message still says "failed to load test case" even though
loadTaskSpecsFromFilesloadsTaskSpecs. Renaming it to "failed to load task spec" (or "task") would keep terminology consistent with the rest of the renamed code.
for _, path := range testFiles {
tc, err := models.LoadTaskSpec(path)
if err != nil {
return nil, fmt.Errorf("failed to load test case %s: %w", path, err)
}
You can also share your feedback on Copilot code review. Take the survey.
| tc, err := models.LoadTaskSpec(path) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to load test case %s: %w", path, err) | ||
| } |
| yml, err := yaml.Marshal(taskSpec) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("marshaling test case %s: %w", tc.TestID, err) | ||
| return nil, fmt.Errorf("marshaling test case %s: %w", taskSpec.TestID, err) | ||
| } |
| // CacheKey generates a unique cache key for a test case run | ||
| // The key is based on: | ||
| // - spec content (name, config, graders) | ||
| // - task content (test case definition) | ||
| // - model ID | ||
| // - fixture file hashes | ||
| func CacheKey(spec *models.BenchmarkSpec, task *models.TestCase, fixtureDir string) (string, error) { | ||
| func CacheKey(spec *models.EvalSpec, task *models.TaskSpec, fixtureDir string) (string, error) { |
| // If taskPatterns and tagPatterns are specified the result is the intersection of the matches between them. | ||
| // If both taskPatterns and tagPatterns are empty, all test cases are returned. | ||
| func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error) { | ||
| func FilterTaskSpecs(taskSpecs []*models.TaskSpec, taskPatterns []string, tagPatterns []string) ([]*models.TaskSpec, error) { | ||
| if len(taskPatterns) == 0 && len(tagPatterns) == 0 { | ||
| return testCases, nil | ||
| return taskSpecs, nil | ||
| } | ||
|
|
||
| var matched []*models.TestCase | ||
| var matched []*models.TaskSpec | ||
|
|
||
| for _, tc := range testCases { | ||
| taskNameMatch, err := matchesTaskOrDisplayName(tc, taskPatterns) | ||
| for _, taskSpec := range taskSpecs { | ||
| taskNameMatch, err := matchesTaskOrDisplayName(taskSpec, taskPatterns) | ||
|
|
||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| tagNameMatch, err := matchesTags(tc, tagPatterns) | ||
| tagNameMatch, err := matchesTags(taskSpec, tagPatterns) | ||
|
|
||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| if taskNameMatch && tagNameMatch { | ||
| matched = append(matched, tc) | ||
| matched = append(matched, taskSpec) | ||
| } | ||
| } | ||
|
|
||
| return matched, nil | ||
| } | ||
|
|
||
| // matchesTaskOrDisplayName reports whether a test case's DisplayName or TestID matches any pattern. | ||
| func matchesTaskOrDisplayName(tc *models.TestCase, patterns []string) (bool, error) { | ||
| func matchesTaskOrDisplayName(tc *models.TaskSpec, patterns []string) (bool, error) { |
|
Going to take a run at this with copilot and just not bother trying to rebase/merge :) |
Uh oh!
There was an error while loading. Please reload this page.