feat: Snapshot auto-update workflow for diff grader#95
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #95 +/- ##
=======================================
Coverage ? 72.62%
=======================================
Files ? 129
Lines ? 14605
Branches ? 0
=======================================
Hits ? 10607
Misses ? 3201
Partials ? 797
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR implements the --update-snapshots flag for waza run, analogous to Jest's --updateSnapshot. When enabled, instead of failing on diff grader mismatches, the current workspace output is written back as the new snapshot baseline. It integrates the flag end-to-end: CLI flag → runner option → grader parameter → grader behavior → summary output.
Changes:
- Adds
UpdateSnapshotsfield toDiffGraderArgs/diffGraderand implements update/create logic incheckSnapshot, plus acountChangedLineshelper - Adds
WithUpdateSnapshotsrunner option that injectsupdate_snapshotsinto diff grader params duringrunGraders - Adds
--update-snapshotsCLI flag, threads it throughrunSingleModel, and renders the update summary inprintSnapshotUpdateSummary
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
internal/graders/diff_grader.go |
Core snapshot update logic: new snapshotUpdate struct, update/create paths in checkSnapshot, writeSnapshot helper, countChangedLines function, updated buildResult feedback |
internal/graders/diff_grader_test.go |
New test file covering mismatch-without-update (fail), update+create flow, and no-changes flow |
internal/orchestration/runner.go |
New updateSnapshots field on TestRunner, WithUpdateSnapshots option, injection of update_snapshots param into diff grader configs in runGraders |
internal/orchestration/runner_orchestration_test.go |
Integration test verifying the runner correctly updates snapshot files via WithUpdateSnapshots |
cmd/waza/cmd_run.go |
New updateSnapshots global flag, Cobra flag registration, WithUpdateSnapshots option passed to runner, printSnapshotUpdateSummary output function |
cmd/waza/cmd_run_test.go |
Flag-parsing test for --update-snapshots and reset in resetRunGlobals |
site/src/content/docs/reference/cli.mdx |
Adds --update-snapshots to the flags table and examples |
.squad/decisions.md |
Squad decision log entries for token diff strategy and model/workflow directive |
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md |
New session log file |
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md |
New session log file |
Comments suppressed due to low confidence (1)
internal/orchestration/runner.go:1227
- Same direct mutation of
vCfg.Parameters(via theparamsalias) issue as in the global validators loop above. When the diff grader parameters map is non-nil,params["update_snapshots"] = truemutates the original spec's parameter map rather than creating a copy, asinjectJudgeModeldoes.
if r.updateSnapshots && kind == models.GraderKindDiff {
params["update_snapshots"] = true
}
spboyer
left a comment
There was a problem hiding this comment.
LGTM — Rusty. Snapshot auto-update is well-designed. --update-snapshots flows cleanly through runner → diff grader. snapshotUpdate struct with create/update/unchanged states gives good UX feedback. countChangedLines is a nice touch. 3 grader tests + runner integration test = solid coverage. Ship it. (Self-authored PR — cannot self-approve via API.)
e019d5a to
7b76de9
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8df06b1 to
03e93a5
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5bc4c57 to
3963ee3
Compare
wbreza
left a comment
There was a problem hiding this comment.
Code Review: PR #95 - feat: Snapshot auto-update workflow for diff grader
What Looks Good
- Path traversal protection via resolveSnapshotPath with symlink resolution
- Exported SnapshotUpdate struct and status constants
- parseSnapshotUpdates handles typed slices, pointer slices, and JSON fallback
- Non-update mode behavior is unchanged
- New snapshot creation when file doesn't exist + update mode
- CRLF normalization in line diff counting
High (1 finding)
- Params map mutation in runner.go - When vCfg.Parameters is non-nil, setting params["update_snapshots"] = true mutates the original spec grader config via reference alias. If the same spec is reused across multi-trial runs, snapshot updates would be enabled for all subsequent trials unintentionally. Verify whether the runner clones the params map or shares it.
Medium (2 findings)
- snapshot_updates always in Details - Empty slice emitted even when not in update mode. Harmless but noisy.
- countChangedLines is positional - Single line insertion reports all subsequent lines as changed. Acceptable for summary heuristic but may mislead.
Summary
| Priority | Count |
|---|---|
| Critical | 0 |
| High | 1 |
| Medium | 2 |
| Low | 0 |
Overall Assessment: Comment - solid implementation with good security (path traversal protection). The params map mutation concern should be verified before approving.
Closes #85