feat: --compare=old:new — two-file diff for rolling agent reviews#163
feat: --compare=old:new — two-file diff for rolling agent reviews#163umputun merged 14 commits intoumputun:masterfrom
--compare=old:new — two-file diff for rolling agent reviews#163Conversation
umputun
left a comment
There was a problem hiding this comment.
recommendation: scrap --compare and instead extend --stdin to parse unified-diff input (--stdin-format=patch or auto-detect diff --git headers). That covers git diff --no-index a b | revdiff --stdin --stdin-format=patch in maybe 30 lines vs ~290 here. A calling skill wraps either invocation just as cleanly so launch friction doesn't matter, and the remaining differences (info popup label, both-paths display) are cosmetic and solvable on the stdin side with small extensions. Same workflow, much less surface area, no new top-level mode.
separately: the framing "rolling agent review" oversells what this is. The actual gap is mechanical: revdiff has no clean way to diff two on-disk files when neither side is anchored in a git ref. That's a generic primitive (config backups, upstream-vs-fork comparisons, etc), not specifically agent-shaped.
independent of which approach: plugin docs not synced. .claude-plugin/skills/revdiff/references/usage.md updated but plugins/codex/skills/revdiff/references/usage.md was not. Per CLAUDE.md they must stay byte-identical.
issues below apply only if you keep --compare as proposed; switching to the stdin path would moot most:
-
colon path parsing is fragile.
strings.Cut(opts.Compare, ":")atapp/compare.go:65splits on first colon. Breaks legal POSIX filenames containing:(e.g. timestamped snapshots like2026-05-01T12:30:00.md) and forecloses any future Windows port. Zero tests for colon-in-path. Options: keep syntax and document the limit, switch to two repeated flags (--compare-old=/--compare-new=, matches--include/--exclude), or two positional args (revdiff --compare /old /new, matchesrevdiff master feature). Either alternative kills the bug. -
DRY hazard.
validateCompareFlag(app/compare.go:13-58) andresolveComparePaths(app/compare.go:64-75) both parseopts.Comparewithstrings.Cut(":", ...)separately. Either have validate return(absOld, absNew, error)or extract a privateparseCompareFlagshared by both, otherwise the parser semantics drift. -
fragile test.
TestParseArgs_CompareConflictsuses non-existent/tmp/a:/tmp/bpaths and only passes because conflict checks return beforeos.Stat. If check order ever flips to stat-first, every conflict subtest silently starts passing for the wrong reason. Use real tempdir paths or assert both error type and message. -
countFileLinesshould be a method onCompareReader.app/diff/compare.go:79, called only from(r *CompareReader).FileDiff. Project rule: functions called only from struct methods must be methods. -
test coverage gaps. Nothing covers: paths-with-colons (relates to #1), binary files in compare mode, large files (the streaming
countFileLinesexists for this reason and isn't exercised), no-trailing-newline through the full compare flow, exit-code-2 from git, stderr-only success path, symlinks, orresolveComparePathsdirectly. -
mutual-exclusion check.
--compareis not excluded from--description/--description-file. Intentional (orthogonal context) or oversight?
minor: insert "--" before path args in the argv at app/diff/compare.go:34. filepath.Abs already makes paths absolute so -flag-shaped paths can't reach git in practice, but the -- hardens that guarantee against future refactors.
tests/lint/race green, argv-style exec safe, no scope creep, no vendor changes.
|
@umputun My use case for this is a planning hook in the agent loop. I've already validated the approach locally with --compare, but I can adapt to a stdin-based implementation if you'd prefer. On --compare vs. extending stdin: I'd push back gently on the redesign. revdiff is a diff review tool - "diff two files" feels like a primary operation, not a corner case to compose. The stdin route would also lose compact-mode toggle and reload-on-demand (the bytes are consumed; revdiff doesn't own the source). The 290-line cost is real, but I think the symmetry with existing modes (--staged, --all-files, --only) and the cleaner UX justify a first-class flag rather than a pipe-composed alternative. Here's the design for plan-review-hook.py: Design: plan-content-as-stateThe hook is stateless on disk between invocations — Claude Code doesn't persist hook state, and inventing a sidecar registry would be brittle across parallel sessions. Instead, the plan itself carries its diff history via an html-comment marker on the first line: <!-- previous revision: /tmp/plan-rev-AAA.md -->The marker is invisible in rendered markdown, trivially greppable, and self-cleaning (each plan chain references its own snapshot, so parallel sessions can't collide). On every
Does this plan look OK to you? |
|
fair points on compact-mode and orchestrator. Compact toggle re-runs the diff with smaller context, which line-level issues from the original review still stand: colon parsing fragility (#1), plugin docs sync miss, DRY hazard, fragile test, on the hook design: the marker-in-plan approach is clever. It keeps state in band so parallel sessions don't collide and hooks-are-stateless isn't a problem. Two concerns:
design looks good otherwise. Ship it once the line-level items are addressed. |
feat: implement CompareReader for git diff --no-index two-file diff feat: wire --compare flag into main.go run() function Switch renderer selection to a switch statement covering compare, stdin, and VCS modes. CompareReader gets workDir from filepath.Dir of the new path. commitsApplicable and compactApplicable already handle CompareReader correctly without changes. Also remove stale nolintlint directive in overlay/info.go. chore: verify acceptance criteria for --compare flag All acceptance criteria verified: diff between two files, identical-file empty diff, compact mode with dividers, word-diff, and all mutual-exclusion error cases. Full test suite passes (make test). feat: use --compare in planning plugin for plan iteration diffs After Claude revises an annotated plan, re-opening revdiff now shows only what changed via --compare=prev:current instead of the full file. First open uses --only as before. Pass --cleanup to discard the snapshot. docs: add --compare flag to README, docs, and plugin usage reference Document the two-file diff mode in all three locations: README (Options table, Examples, new Two-File Diff section), site/docs.html (options table, examples, new section), and the Claude plugin usage reference. fix: address code review findings - validateCompareFlag: reject --include and --exclude (silently ignored in compare mode) - main.go: use filepath.Abs before filepath.Dir to avoid "project: ." in info overlay for relative paths - compare.go: replace pre-allocated make+append slice with composite literal (removes wrong mnd comment) - compare_test.go: use t.TempDir() paths instead of literal /tmp/ in TestCompareReader_ChangedFiles - renderer_setup_test.go: add CompareReader case to TestCompactApplicable, --compare case to TestCommitsApplicable - launch-plan-review.sh: guard PLAN_ABS cd against missing directory so --cleanup survives deleted plan dirs - CLAUDE.md: document --compare mode in setupVCSRenderer, mutual-exclusion, compact mode, and project structure fix: address code review findings - diffError: exit code 1 with stdout content is success even if stderr has warnings; only exit code 1 with empty stdout is a real failure (e.g. missing file) - main.go: resolve both --compare paths to absolute before passing to NewCompareReader - docs: add --include/--exclude to --compare mutual-exclusion list in README, docs.html, and usage.md - reviewinfo: add Compare field to ReviewInfoConfig; reviewHeaderText returns "two-file diff" instead of "standalone files" for --compare mode - renderer_setup_test: add compare mode case to TestReloadApplicable fix: address code review findings - Handle filepath.Abs errors in run() compare case instead of discarding with _; extracted into resolveComparePaths helper to keep gocyclo ≤20 - Move plan snapshot (cp PLAN_ABS PREV_FILE) to after each overlay successfully exits in launch-plan-review.sh; snapshot before launch caused stale prev-file if the terminal command failed mid-dispatch fix: address code review findings - countFileLines: stream-count lines instead of os.ReadFile to avoid loading the entire old file into memory when compact mode is active with --compare; behavior is identical (non-newline-terminated file counts as one additional line, empty file returns 0) - launch-plan-review.sh: derive PREV_FILE extension from the actual plan file's extension instead of hardcoding .md, so syntax highlighting in the diff header is correct for non-markdown plans fix: address code review findings - Reject --compare combined with --annotations (silent annotation drop) - Move completed plan to docs/plans/completed/ chore: remove plan file and revert planning-plugin changes (separate PR) fix: validate --compare paths are regular files
…ate parse path The single --compare=old:new flag forced custom colon-splitting and could not represent file paths containing colons (e.g. ISO timestamps). Replace it with paired --compare-old / --compare-new flags that mirror the --include / --exclude shape, drop the strings.Cut parser entirely, and fold resolveComparePaths into validateCompareFlag so the flag has a single parse + validate site. Absolute paths now stash on opts at parse time and run() consumes them directly. Conflict checks still run before stat checks so a misuse error is not masked by a missing-file error. --description and --description-file remain orthogonal — compare mode keeps the info-popup prose channel. Updates README, site/docs.html, and the diff-review skill usage docs in the same commit so released docs do not ship stale.
…ileLines Insert -- between flag args and path args in the git diff --no-index invocation so a path that begins with a dash cannot be parsed as a flag, even if a future caller bypasses filepath.Abs. countFileLines now hangs off CompareReader and reads r.oldPath directly — the standalone form took a path parameter only because it was written before the constructor settled. The nolint:gosec on os.Open drops with it (the rule only flagged the parameter form).
…ks, error paths CompareConflicts now uses real t.TempDir() files so a future stat-first reordering would not silently green these subtests — the assertion under test is the conflict, not the file existence. New CompareReader cases: - paths_with_colons_in_name pins the OG bug killed by the two-flag form - binary_files exercises the "Binary files ... differ" stdout path - no_trailing_newline matrix (4 subcases) covers EOF-newline corners - git_error_surfaces_stderr asserts diffError preserves git's stderr - symlinks (both-symlink + mixed) confirms Stat-follows-symlink contract - count_file_lines_large_file_streaming asserts a >5MB read stays under a HeapAlloc cap of file_size/4 (32KB buffer) so a future refactor that slurps the file would fail loudly
show only what changed between plan revisions instead of re-rendering the full plan after every rewrite. plan content carries its own diff history via a first-line html-comment marker pointing at the previous snapshot: <!-- previous revision: /tmp/plan-rev-AAA.md --> on each ExitPlanMode the hook strips the marker, writes the new revision to a snapshot, and either opens revdiff in --compare against the marker's target (when present on disk) or falls back to --only mode. on annotations the hook denies and tells the agent to put the new snapshot path as the first line of the next plan; on no annotations it cleans up. launch-plan-review.sh now dispatches on argc: 1 → --only=<plan>, 2 → --compare=<old>:<new>, with PLAN_FILE pointing at the new revision for the overlay title.
in rolling plan-revision review the user wants to read the new state with the round's edits highlighted, not pore over the full +/- diff against the previous revision. --collapsed hides removed lines while preserving inline highlights on what changed, which matches how human reviewers naturally read prose revisions. scoped to compare mode only (2-arg invocation). --only mode shows the file as context-only with no removed lines, so --collapsed would be a no-op.
Mirror the flag-shape change from compare-flag's refactor commit. The launcher now builds REVDIFF_ARGS as two pre-shell-quoted tokens in compare mode (--compare-old=<old> --compare-new=<new>) instead of the old colon-joined --compare=<old>:<new> form. sq() moved to the top of the file so the args block can use it; the duplicate definition lower down is gone.
…olling-review contract The previous-iteration snapshot was unlinked unconditionally after the launcher subprocess returned, regardless of whether the launcher actually ran revdiff against the user. On any launcher non-zero exit (no overlay terminal available, AppleScript split error, cmux surface lookup failure, etc.) the user never saw the diff but old_snap was wiped, dangling the next round's marker and silently dropping the chain to --only. Gate the unlink on result.returncode == 0 and surface launcher failure to the user via the hook's ask response instead of pretending the plan was reviewed clean. README adds a "Rolling reviews" section covering the marker contract, the snapshot lifecycle (now correct on the failure path), and an explicit acknowledgment of the agent-discipline failure mode (forgotten marker → fallback to --only, self-heals on the next round). Override section notes the two-args / one-arg dispatch and the failure semantics so override authors know not to obscure the launcher's exit code.
… docs Security - planning hook restricts the agent-supplied "previous revision" marker to a trusted snapshot under \$TMPDIR with the plan-rev-* prefix. without this guard the agent could point --compare-old at any readable file (~/.ssh/id_rsa, /etc/passwd, …) and surface its contents through the diff overlay and subsequent annotations. - marker regex now tolerates leading whitespace before <!-- so common LLM drift (blank line, indented fence) does not silently collapse the rolling-review chain to --only mode. Hook resilience - subprocess.run is now wrapped in try/except for TimeoutExpired and OSError; both branches preserve old_snap and clean up the orphan new_snap so the rolling chain survives a hung or failed-to-start launcher. Test rigor - drop three tautological "compare mode" rows in renderer_setup_test.go (TestCommitsApplicable, TestReloadApplicable) that passed for reasons unrelated to compare logic; rename the TestCompactApplicable case to honestly reflect what it pins (CompareReader != FileReader). - TestCompareReader_FileDiff_GitErrorSurfacesStderr no longer accepts the bare "exit" fallback substring; requires an actual stderr token. - TestCompareReader_FileDiff_BinaryFiles now asserts the exact one-placeholder contract instead of vacuously iterating an empty slice. - TestCompareReader_FileDiff_Symlinks asserts non-zero add+remove counts matching git's actual symlink-object semantics (not the original NotEmpty-only check). Cleanup - collapse the eight repeated conflict checks in validateCompareFlag into a slice-driven loop; reads better, mirrors --include/--exclude shape on the data side. Docs - CLAUDE.md: replace 3 stale --compare=old:new references with --compare-old/--compare-new. - CHANGELOG.md: new "Unreleased" section covering --compare-old/--compare-new and the rolling planning-hook review. - site/index.html: extend All-files-mode card to mention --compare-old/--compare-new.
…on, etc. Security - close TOCTOU on the planning hook's marker path. trusted_snapshot now returns the canonical resolved Path (or None) and the caller uses it as old_snap; previously the hook validated the resolved target but handed the unresolved candidate (potentially a symlink) to git/revdiff, letting the symlink be re-pointed between validation and use. Hook resilience - propagate revdiff's exit code through every async launcher branch (zellij, kitty, wezterm, cmux, ghostty, iTerm2, emacs vterm) by writing the rc to the sentinel atomically (printf + mv -f) and exiting the outer launcher with that rc instead of an unconditional exit 0. without this, revdiff failures inside an overlay terminal silently surfaced as "plan reviewed, no annotations" — bypassing the new returncode-gating logic and the rolling-review chain. tmux already propagates exit code natively via display-popup -E. Quality - countFileLines now treats non-EOF read errors as "unknown" (returns 0) rather than returning a partial count; a partial count would mislabel the trailing-divider "⋯ N lines ⋯" footer. Docs - add --annotations to the compare-mode mutual-exclusion list in README.md, CLAUDE.md, site/docs.html, and the diff-review skill usage.md so the documented contract matches the validator.
mv -f atomically renames SENTINEL.tmp to SENTINEL on the success path, so the temp file does not normally outlive the script — but if the launcher or overlay terminal is killed mid-write, .tmp leaks in TMPDIR. add it to each async branch's trap list so the cleanup is robust under signals.
both branches inline the inner script via sh -c instead of writing a LAUNCH_SCRIPT, so the existing async-branch traps did not cover them. add a per-branch trap so SENTINEL and SENTINEL.tmp clean up on signal.
Adds the --compare-old/--compare-new inline example and the full "Two-File Diff" section that landed in the claude-plugin reference when --compare flags shipped. The codex skills are a content-mirror of the claude-plugin skills (copies, not symlinks); usage.md drifted because earlier compare-flag updates only touched the claude-plugin side.
|
Pushed fixes addressing the review items:
Added hook implementation — PR description is updated |
umputun
left a comment
There was a problem hiding this comment.
round-1 items addressed.
new findings:
-
stale launcher override silent-failure. The README documents
${CLAUDE_PLUGIN_DATA}/scripts/launch-plan-review.shas the supported customization path. Anyone who copied the bundled launcher there before this PR gets bitten silently:- pre-PR launcher:
if [ $# -lt 1 ]; then exit 1; fi; PLAN_FILE="$1"accepts$# >= 1and only reads$1 - new hook in compare mode passes
<old_snap> <new_snap>(plan-review-hook.py:180-183) - resolver picks the user override (user layer wins over bundled)
- stale override accepts the call, ignores
$2, opens revdiff against the OLD revision - user sees previous content (which they already reviewed), submits no annotations
- hook reports "no annotations" -> plan accepted, user never saw the new revision
the README explicitly tells users to copy the bundled launcher as a starting template, so this is a real upgrade hazard, not theoretical. Probably needs a fix before merge.
- pre-PR launcher:
-
TestParseArgs_CompareConflictsassertsNotEmptyon privatecompareAbsOld/compareAbsNewcache fields (app/compare_test.go:23-24). Confirms population but not correctness. Compare against expectedfilepath.Abs(oldFile)value, or drop and rely on end-to-end coverage inapp/diff/compare_test.go.
discussion-only:
trusted_snapshot (plan-review-hook.py:47-66) gates by $TMPDIR + plan-rev- prefix but doesn't bind to per-chain state. A crafted or accidentally-pointing marker can resolve to another in-flight session's snapshot. That means parallel-session review against the wrong baseline, plus the post-success unlink breaks the other chain. Self-heals via --only fallback so practical impact is one round of focused-diff loss. Worth a sidecar nonce eventually if you ever see two sessions running at once in practice.
otherwise lgtm once the launcher hazard above is resolved.
…rompt
- swap compare-mode arg order from (old, new) to (new, old) in both the
bundled launcher's 2-arg branch and the planning hook so that stale
user-layer overrides at ${CLAUDE_PLUGIN_DATA}/scripts/ — copies of
master's 1-arg launcher made before compare mode shipped — silently
degrade to --only review of the NEW revision instead of opening the
OLD one the user already reviewed. revdiff itself still receives
--compare-old=<OLD> --compare-new=<NEW>; the launcher relabels locally.
- tighten the deny-reason text so the agent is explicitly told to ignore
any other plan-rev-*.md marker it may have seen earlier in the
conversation, addressing intra-session chain-mixup at the prompt layer.
- replace assert.NotEmpty on opts.compareAbsOld/New with equality against
filepath.Abs(input) so the test would actually catch a swapped-fields
bug rather than just confirming the cache is populated.
- document the new launcher contract (new-revision first) and the
graceful-degradation rationale in the planning plugin README.
|
addressed new findings
|
Motivation
revdiff is built for rolling review loops with a Claude agent: the agent produces content, the user annotates lines, the agent revises, the loop repeats until the user quits without annotations. This works well today for code diffs, where each iteration is naturally framed as a
+/-view of what the agent just changed.The same loop falls apart for agent-rewritten documents — plans, design docs, specs, generated reports. Today the agent can only re-open the rewritten file as context-only (
--only), so the user has to re-read the whole document to find what changed in response to their last round of annotations. Long documents become exhausting to review iteratively, and small revisions are easy to miss.--compare=old:newfixes this. The agent snapshots the document before rewriting it, then re-opens revdiff against the snapshot:Each iteration becomes a focused diff of just this round's edits, with full annotation support — the same loop revdiff already supports for code, now extended to any document the agent revises. The missing primitive for iterative agent-driven document review.
Usage
--compareis mutually exclusive with: refs,--staged,--only,--all-files,--stdin,--include,--exclude,--annotations.How it works
CompareReaderrunsgit diff --no-indexand feeds the output through the sameparseUnifiedDiffparser used by every other diff source. Because the display layer is unchanged, every existing feature works automatically: word-diff, compact mode with correct line-count dividers, syntax highlighting, scrollbar, and inline annotations.A few non-obvious details:
os.ReadFile, so large files don't get pulled into memory.git diff --no-indexworks anywheregitis on$PATH.Changes
app/diff/compare.go— newCompareReadertype implementingRendererapp/config.go—--compareflag +validateCompareFlag(parsesold:new, validates both paths are regular files, enforces mutual exclusions)app/main.go— wires compare mode before stdin/VCS dispatch;resolveComparePathsresolves both paths to absolute;reloadApplicablereturns true for compare modeapp/reviewinfo.go— info overlay shows "two-file diff" header for compare modeVerification
Update
feat(planning-hook): rolling compare-mode review across plan revisions
Heads up — also rolled in the planning-hook orchestration that was explicitly punted in the original PR scope ("A planning skill / hook that orchestrates this loop automatically … is not part of this PR.").
It lives in
plugins/revdiff-planning/and is the first concrete consumer of--compare-old/--compare-newend-to-end.What it does. Intercepts
ExitPlanMode, snapshots each plan revision, and drives a rolling compare across iterations so the user only re-reads what actually changed in response to last round's annotations — same UX win--comparegives ad-hoc, applied to the plan-review loop.State carrier — plan-content-as-marker. Hook state survives across tool boundaries via an HTML-comment line the agent prepends to its revised plan:
<!-- previous revision: /tmp/plan-rev-AAA.md -->. Hook reads it, strips it from the saved snapshot, opens--compare-old=AAA --compare-new=BBB. Markers that don't resolve to a snapshot under$TMPDIRwith theplan-rev-*prefix are rejected (closes a confused-deputy primitive — agent can't point the marker at~/.ssh/id_rsa). Path resolution returns the canonicalPath.resolve()to close a TOCTOU on symlink markers.Failure modes are recoverable. If the agent forgets the marker → fall back to
--only, deny reason re-issues the contract, self-heals next round. If the launcher exits non-zero (no terminal available, AppleScript split fails, etc.) → preserveold_snap, surface the exit code viaask, do not pretend the plan was reviewed.Scope. Claude-only — codex has no hook system, pi's flow is different. No marker convention is exposed in the diff-review skill (skill is in-process with Claude, doesn't need it).