Skip to content

feat: --compare=old:new — two-file diff for rolling agent reviews#163

Merged
umputun merged 14 commits intoumputun:masterfrom
rashpile:compare-flag
May 1, 2026
Merged

feat: --compare=old:new — two-file diff for rolling agent reviews#163
umputun merged 14 commits intoumputun:masterfrom
rashpile:compare-flag

Conversation

@rashpile
Copy link
Copy Markdown
Contributor

@rashpile rashpile commented May 1, 2026

Motivation

revdiff is built for rolling review loops with a Claude agent: the agent produces content, the user annotates lines, the agent revises, the loop repeats until the user quits without annotations. This works well today for code diffs, where each iteration is naturally framed as a +/- view of what the agent just changed.

The same loop falls apart for agent-rewritten documents — plans, design docs, specs, generated reports. Today the agent can only re-open the rewritten file as context-only (--only), so the user has to re-read the whole document to find what changed in response to their last round of annotations. Long documents become exhausting to review iteratively, and small revisions are easy to miss.

--compare=old:new fixes this. The agent snapshots the document before rewriting it, then re-opens revdiff against the snapshot:

agent shows v1     → user annotates "tighten section 3, drop section 5"
agent rewrites     → snapshots v1, opens revdiff --compare=v1:v2
user sees only what changed → annotates again, or quits

Each iteration becomes a focused diff of just this round's edits, with full annotation support — the same loop revdiff already supports for code, now extended to any document the agent revises. The missing primitive for iterative agent-driven document review.

Usage

revdiff --compare=/tmp/plan-v1.md:docs/plans/plan.md

--compare is mutually exclusive with: refs, --staged, --only, --all-files, --stdin, --include, --exclude, --annotations.

How it works

CompareReader runs git diff --no-index and feeds the output through the same parseUnifiedDiff parser used by every other diff source. Because the display layer is unchanged, every existing feature works automatically: word-diff, compact mode with correct line-count dividers, syntax highlighting, scrollbar, and inline annotations.

A few non-obvious details:

  • Exit code 1 from git (files differ — the normal case) is treated as success. Exit 1 with empty stdout is a real failure (e.g. missing file).
  • In compact mode, the old file's line count is needed for trailing dividers. It's computed via streaming reads, not os.ReadFile, so large files don't get pulled into memory.
  • No git repository is required — git diff --no-index works anywhere git is on $PATH.

Changes

  • app/diff/compare.go — new CompareReader type implementing Renderer
  • app/config.go--compare flag + validateCompareFlag (parses old:new, validates both paths are regular files, enforces mutual exclusions)
  • app/main.go — wires compare mode before stdin/VCS dispatch; resolveComparePaths resolves both paths to absolute; reloadApplicable returns true for compare mode
  • app/reviewinfo.go — info overlay shows "two-file diff" header for compare mode
  • README, site/docs.html, plugin usage reference — documented with examples

Verification

make test                                         # unit tests pass
echo "a" > /tmp/a.txt && echo "b" > /tmp/b.txt
revdiff --compare=/tmp/a.txt:/tmp/b.txt           # two-line diff opens
revdiff --compare=/tmp/a.txt:/tmp/a.txt           # identical files: empty diff, no crash
revdiff --compare=/tmp/a.txt:/tmp/b.txt --stdin   # error: mutual exclusion

Update

feat(planning-hook): rolling compare-mode review across plan revisions

Heads up — also rolled in the planning-hook orchestration that was explicitly punted in the original PR scope ("A planning skill / hook that orchestrates this loop automatically … is not part of this PR.").

It lives in plugins/revdiff-planning/ and is the first concrete consumer of --compare-old/--compare-new end-to-end.

What it does. Intercepts ExitPlanMode, snapshots each plan revision, and drives a rolling compare across iterations so the user only re-reads what actually changed in response to last round's annotations — same UX win --compare gives ad-hoc, applied to the plan-review loop.

State carrier — plan-content-as-marker. Hook state survives across tool boundaries via an HTML-comment line the agent prepends to its revised plan: <!-- previous revision: /tmp/plan-rev-AAA.md -->. Hook reads it, strips it from the saved snapshot, opens --compare-old=AAA --compare-new=BBB. Markers that don't resolve to a snapshot under $TMPDIR with the plan-rev-* prefix are rejected (closes a confused-deputy primitive — agent can't point the marker at ~/.ssh/id_rsa). Path resolution returns the canonical Path.resolve() to close a TOCTOU on symlink markers.

Failure modes are recoverable. If the agent forgets the marker → fall back to --only, deny reason re-issues the contract, self-heals next round. If the launcher exits non-zero (no terminal available, AppleScript split fails, etc.) → preserve old_snap, surface the exit code via ask, do not pretend the plan was reviewed.

Scope. Claude-only — codex has no hook system, pi's flow is different. No marker convention is exposed in the diff-review skill (skill is in-process with Claude, doesn't need it).

@rashpile rashpile requested a review from umputun as a code owner May 1, 2026 07:49
Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommendation: scrap --compare and instead extend --stdin to parse unified-diff input (--stdin-format=patch or auto-detect diff --git headers). That covers git diff --no-index a b | revdiff --stdin --stdin-format=patch in maybe 30 lines vs ~290 here. A calling skill wraps either invocation just as cleanly so launch friction doesn't matter, and the remaining differences (info popup label, both-paths display) are cosmetic and solvable on the stdin side with small extensions. Same workflow, much less surface area, no new top-level mode.

separately: the framing "rolling agent review" oversells what this is. The actual gap is mechanical: revdiff has no clean way to diff two on-disk files when neither side is anchored in a git ref. That's a generic primitive (config backups, upstream-vs-fork comparisons, etc), not specifically agent-shaped.

independent of which approach: plugin docs not synced. .claude-plugin/skills/revdiff/references/usage.md updated but plugins/codex/skills/revdiff/references/usage.md was not. Per CLAUDE.md they must stay byte-identical.

issues below apply only if you keep --compare as proposed; switching to the stdin path would moot most:

  1. colon path parsing is fragile. strings.Cut(opts.Compare, ":") at app/compare.go:65 splits on first colon. Breaks legal POSIX filenames containing : (e.g. timestamped snapshots like 2026-05-01T12:30:00.md) and forecloses any future Windows port. Zero tests for colon-in-path. Options: keep syntax and document the limit, switch to two repeated flags (--compare-old=/--compare-new=, matches --include/--exclude), or two positional args (revdiff --compare /old /new, matches revdiff master feature). Either alternative kills the bug.

  2. DRY hazard. validateCompareFlag (app/compare.go:13-58) and resolveComparePaths (app/compare.go:64-75) both parse opts.Compare with strings.Cut(":", ...) separately. Either have validate return (absOld, absNew, error) or extract a private parseCompareFlag shared by both, otherwise the parser semantics drift.

  3. fragile test. TestParseArgs_CompareConflicts uses non-existent /tmp/a:/tmp/b paths and only passes because conflict checks return before os.Stat. If check order ever flips to stat-first, every conflict subtest silently starts passing for the wrong reason. Use real tempdir paths or assert both error type and message.

  4. countFileLines should be a method on CompareReader. app/diff/compare.go:79, called only from (r *CompareReader).FileDiff. Project rule: functions called only from struct methods must be methods.

  5. test coverage gaps. Nothing covers: paths-with-colons (relates to #1), binary files in compare mode, large files (the streaming countFileLines exists for this reason and isn't exercised), no-trailing-newline through the full compare flow, exit-code-2 from git, stderr-only success path, symlinks, or resolveComparePaths directly.

  6. mutual-exclusion check. --compare is not excluded from --description/--description-file. Intentional (orthogonal context) or oversight?

minor: insert "--" before path args in the argv at app/diff/compare.go:34. filepath.Abs already makes paths absolute so -flag-shaped paths can't reach git in practice, but the -- hardens that guarantee against future refactors.

tests/lint/race green, argv-style exec safe, no scope creep, no vendor changes.

@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented May 1, 2026

@umputun My use case for this is a planning hook in the agent loop. I've already validated the approach locally with --compare, but I can adapt to a stdin-based implementation if you'd prefer.

On --compare vs. extending stdin: I'd push back gently on the redesign. revdiff is a diff review tool - "diff two files" feels like a primary operation, not a corner case to compose. The stdin route would also lose compact-mode toggle and reload-on-demand (the bytes are consumed; revdiff doesn't own the source). The 290-line cost is real, but I think the symmetry with existing modes (--staged, --all-files, --only) and the cleaner UX justify a first-class flag rather than a pipe-composed alternative.

Here's the design for plan-review-hook.py:

Design: plan-content-as-state

The hook is stateless on disk between invocations — Claude Code doesn't persist hook state, and inventing a sidecar registry would be brittle across parallel sessions. Instead, the plan itself carries its diff history via an html-comment marker on the first line:

<!-- previous revision: /tmp/plan-rev-AAA.md -->

The marker is invisible in rendered markdown, trivially greppable, and self-cleaning (each plan chain references its own snapshot, so parallel sessions can't collide). On every ExitPlanMode the hook:

  1. Reads the marker from the new plan (if present)
  2. Strips it from the content saved to disk — keeps diffs clean
  3. Writes the stripped content to a fresh /tmp/plan-rev-*.md snapshot
  4. Opens revdiff in --compare against the marker target if it exists, else falls back to --only
  5. On annotations → deny + tells the agent to put <!-- previous revision: <new-snapshot> --> as the first line of the next plan
  6. On no annotations → ask + deletes the new snapshot (loop is over)
  7. Deletes the old snapshot after the compare succeeds
agent calls ExitPlanMode (v1)   → hook saves /tmp/plan-rev-AAA.md, opens --only
user annotates "tighten step 3" → hook denies, tells agent to start v2 with marker
agent calls ExitPlanMode (v2)   → hook reads marker, saves /tmp/plan-rev-BBB.md,
                                   opens --compare=AAA:BBB (focused diff of round 2)
user annotates / quits clean    → loop continues or terminates with cleanup

Does this plan look OK to you?

@umputun
Copy link
Copy Markdown
Owner

umputun commented May 1, 2026

fair points on compact-mode and orchestrator. Compact toggle re-runs the diff with smaller context, which --compare does cleanly via the same constructor; --stdin would have to fake it by post-trimming context locally. Real functional differentiator, I had this wrong. Combined with the planning hook actually being in flight (not speculative), --compare as a first-class mode is fine. Withdrawing the scrap-and-replace recommendation.

line-level issues from the original review still stand: colon parsing fragility (#1), plugin docs sync miss, DRY hazard, fragile test, countFileLines shape, test gaps, mutual-exclusion gap, the -- argv nit. None of those depend on which mode shape you pick.

on the hook design:

the marker-in-plan approach is clever. It keeps state in band so parallel sessions don't collide and hooks-are-stateless isn't a problem. Two concerns:

  1. agent-discipline risk. Chain depends on the agent prepending <!-- previous revision: ... --> to v2+. If it forgets (tool drift, context truncation, model swap), the hook treats v2 as v1 and opens --only instead of --compare. User loses the focused diff for that round. Recoverable since fallback is --only not a crash, but the deny-message instruction has to survive across multiple turns. Probably fine in practice, worth acknowledging the failure mode.

  2. snapshot cleanup timing. "deletes old snapshot after compare succeeds": which exit counts as success? Revdiff exits non-zero on annotated quit, zero on clean quit. If any exit deletes AAA, a user who closes mid-thought breaks the next iteration. Pin down the trigger explicitly.

design looks good otherwise. Ship it once the line-level items are addressed.

rashpile added 13 commits May 1, 2026 23:44
feat: implement CompareReader for git diff --no-index two-file diff

feat: wire --compare flag into main.go run() function

Switch renderer selection to a switch statement covering compare,
stdin, and VCS modes. CompareReader gets workDir from filepath.Dir
of the new path. commitsApplicable and compactApplicable already
handle CompareReader correctly without changes.

Also remove stale nolintlint directive in overlay/info.go.

chore: verify acceptance criteria for --compare flag

All acceptance criteria verified: diff between two files, identical-file
empty diff, compact mode with dividers, word-diff, and all mutual-exclusion
error cases. Full test suite passes (make test).

feat: use --compare in planning plugin for plan iteration diffs

After Claude revises an annotated plan, re-opening revdiff now shows
only what changed via --compare=prev:current instead of the full file.
First open uses --only as before. Pass --cleanup to discard the snapshot.

docs: add --compare flag to README, docs, and plugin usage reference

Document the two-file diff mode in all three locations: README (Options table,
Examples, new Two-File Diff section), site/docs.html (options table, examples,
new section), and the Claude plugin usage reference.

fix: address code review findings

- validateCompareFlag: reject --include and --exclude (silently ignored in compare mode)
- main.go: use filepath.Abs before filepath.Dir to avoid "project: ." in info overlay for relative paths
- compare.go: replace pre-allocated make+append slice with composite literal (removes wrong mnd comment)
- compare_test.go: use t.TempDir() paths instead of literal /tmp/ in TestCompareReader_ChangedFiles
- renderer_setup_test.go: add CompareReader case to TestCompactApplicable, --compare case to TestCommitsApplicable
- launch-plan-review.sh: guard PLAN_ABS cd against missing directory so --cleanup survives deleted plan dirs
- CLAUDE.md: document --compare mode in setupVCSRenderer, mutual-exclusion, compact mode, and project structure

fix: address code review findings

- diffError: exit code 1 with stdout content is success even if stderr has warnings; only exit code 1 with empty stdout is a real failure (e.g. missing file)
- main.go: resolve both --compare paths to absolute before passing to NewCompareReader
- docs: add --include/--exclude to --compare mutual-exclusion list in README, docs.html, and usage.md
- reviewinfo: add Compare field to ReviewInfoConfig; reviewHeaderText returns "two-file diff" instead of "standalone files" for --compare mode
- renderer_setup_test: add compare mode case to TestReloadApplicable

fix: address code review findings

- Handle filepath.Abs errors in run() compare case instead of discarding
  with _; extracted into resolveComparePaths helper to keep gocyclo ≤20
- Move plan snapshot (cp PLAN_ABS PREV_FILE) to after each overlay
  successfully exits in launch-plan-review.sh; snapshot before launch
  caused stale prev-file if the terminal command failed mid-dispatch

fix: address code review findings

- countFileLines: stream-count lines instead of os.ReadFile to avoid
  loading the entire old file into memory when compact mode is active
  with --compare; behavior is identical (non-newline-terminated file
  counts as one additional line, empty file returns 0)
- launch-plan-review.sh: derive PREV_FILE extension from the actual
  plan file's extension instead of hardcoding .md, so syntax
  highlighting in the diff header is correct for non-markdown plans

fix: address code review findings

- Reject --compare combined with --annotations (silent annotation drop)
- Move completed plan to docs/plans/completed/

chore: remove plan file and revert planning-plugin changes (separate PR)

fix: validate --compare paths are regular files
…ate parse path

The single --compare=old:new flag forced custom colon-splitting and could
not represent file paths containing colons (e.g. ISO timestamps). Replace
it with paired --compare-old / --compare-new flags that mirror the
--include / --exclude shape, drop the strings.Cut parser entirely, and
fold resolveComparePaths into validateCompareFlag so the flag has a
single parse + validate site. Absolute paths now stash on opts at parse
time and run() consumes them directly.

Conflict checks still run before stat checks so a misuse error is not
masked by a missing-file error. --description and --description-file
remain orthogonal — compare mode keeps the info-popup prose channel.

Updates README, site/docs.html, and the diff-review skill usage docs in
the same commit so released docs do not ship stale.
…ileLines

Insert -- between flag args and path args in the git diff --no-index
invocation so a path that begins with a dash cannot be parsed as a flag,
even if a future caller bypasses filepath.Abs.

countFileLines now hangs off CompareReader and reads r.oldPath directly
— the standalone form took a path parameter only because it was written
before the constructor settled. The nolint:gosec on os.Open drops with
it (the rule only flagged the parameter form).
…ks, error paths

CompareConflicts now uses real t.TempDir() files so a future stat-first
reordering would not silently green these subtests — the assertion
under test is the conflict, not the file existence.

New CompareReader cases:
- paths_with_colons_in_name pins the OG bug killed by the two-flag form
- binary_files exercises the "Binary files ... differ" stdout path
- no_trailing_newline matrix (4 subcases) covers EOF-newline corners
- git_error_surfaces_stderr asserts diffError preserves git's stderr
- symlinks (both-symlink + mixed) confirms Stat-follows-symlink contract
- count_file_lines_large_file_streaming asserts a >5MB read stays under
  a HeapAlloc cap of file_size/4 (32KB buffer) so a future refactor that
  slurps the file would fail loudly
show only what changed between plan revisions instead of re-rendering the
full plan after every rewrite. plan content carries its own diff history
via a first-line html-comment marker pointing at the previous snapshot:

  <!-- previous revision: /tmp/plan-rev-AAA.md -->

on each ExitPlanMode the hook strips the marker, writes the new revision
to a snapshot, and either opens revdiff in --compare against the marker's
target (when present on disk) or falls back to --only mode. on annotations
the hook denies and tells the agent to put the new snapshot path as the
first line of the next plan; on no annotations it cleans up.

launch-plan-review.sh now dispatches on argc: 1 → --only=<plan>,
2 → --compare=<old>:<new>, with PLAN_FILE pointing at the new revision
for the overlay title.
in rolling plan-revision review the user wants to read the new state with
the round's edits highlighted, not pore over the full +/- diff against the
previous revision. --collapsed hides removed lines while preserving inline
highlights on what changed, which matches how human reviewers naturally
read prose revisions.

scoped to compare mode only (2-arg invocation). --only mode shows the file
as context-only with no removed lines, so --collapsed would be a no-op.
Mirror the flag-shape change from compare-flag's refactor commit. The
launcher now builds REVDIFF_ARGS as two pre-shell-quoted tokens in
compare mode (--compare-old=<old> --compare-new=<new>) instead of the
old colon-joined --compare=<old>:<new> form. sq() moved to the top of
the file so the args block can use it; the duplicate definition lower
down is gone.
…olling-review contract

The previous-iteration snapshot was unlinked unconditionally after the
launcher subprocess returned, regardless of whether the launcher actually
ran revdiff against the user. On any launcher non-zero exit (no overlay
terminal available, AppleScript split error, cmux surface lookup failure,
etc.) the user never saw the diff but old_snap was wiped, dangling the
next round's marker and silently dropping the chain to --only.

Gate the unlink on result.returncode == 0 and surface launcher failure
to the user via the hook's ask response instead of pretending the plan
was reviewed clean.

README adds a "Rolling reviews" section covering the marker contract,
the snapshot lifecycle (now correct on the failure path), and an explicit
acknowledgment of the agent-discipline failure mode (forgotten marker →
fallback to --only, self-heals on the next round). Override section
notes the two-args / one-arg dispatch and the failure semantics so
override authors know not to obscure the launcher's exit code.
… docs

Security
- planning hook restricts the agent-supplied "previous revision" marker
  to a trusted snapshot under \$TMPDIR with the plan-rev-* prefix. without
  this guard the agent could point --compare-old at any readable file
  (~/.ssh/id_rsa, /etc/passwd, …) and surface its contents through the
  diff overlay and subsequent annotations.
- marker regex now tolerates leading whitespace before <!-- so common
  LLM drift (blank line, indented fence) does not silently collapse
  the rolling-review chain to --only mode.

Hook resilience
- subprocess.run is now wrapped in try/except for TimeoutExpired and OSError;
  both branches preserve old_snap and clean up the orphan new_snap so
  the rolling chain survives a hung or failed-to-start launcher.

Test rigor
- drop three tautological "compare mode" rows in renderer_setup_test.go
  (TestCommitsApplicable, TestReloadApplicable) that passed for reasons
  unrelated to compare logic; rename the TestCompactApplicable case to
  honestly reflect what it pins (CompareReader != FileReader).
- TestCompareReader_FileDiff_GitErrorSurfacesStderr no longer accepts
  the bare "exit" fallback substring; requires an actual stderr token.
- TestCompareReader_FileDiff_BinaryFiles now asserts the exact
  one-placeholder contract instead of vacuously iterating an empty slice.
- TestCompareReader_FileDiff_Symlinks asserts non-zero add+remove counts
  matching git's actual symlink-object semantics (not the original
  NotEmpty-only check).

Cleanup
- collapse the eight repeated conflict checks in validateCompareFlag
  into a slice-driven loop; reads better, mirrors --include/--exclude
  shape on the data side.

Docs
- CLAUDE.md: replace 3 stale --compare=old:new references with --compare-old/--compare-new.
- CHANGELOG.md: new "Unreleased" section covering --compare-old/--compare-new and the rolling planning-hook review.
- site/index.html: extend All-files-mode card to mention --compare-old/--compare-new.
…on, etc.

Security
- close TOCTOU on the planning hook's marker path. trusted_snapshot now
  returns the canonical resolved Path (or None) and the caller uses it as
  old_snap; previously the hook validated the resolved target but handed
  the unresolved candidate (potentially a symlink) to git/revdiff,
  letting the symlink be re-pointed between validation and use.

Hook resilience
- propagate revdiff's exit code through every async launcher branch
  (zellij, kitty, wezterm, cmux, ghostty, iTerm2, emacs vterm) by
  writing the rc to the sentinel atomically (printf + mv -f) and exiting
  the outer launcher with that rc instead of an unconditional exit 0.
  without this, revdiff failures inside an overlay terminal silently
  surfaced as "plan reviewed, no annotations" — bypassing the new
  returncode-gating logic and the rolling-review chain. tmux already
  propagates exit code natively via display-popup -E.

Quality
- countFileLines now treats non-EOF read errors as "unknown" (returns 0)
  rather than returning a partial count; a partial count would mislabel
  the trailing-divider "⋯ N lines ⋯" footer.

Docs
- add --annotations to the compare-mode mutual-exclusion list in
  README.md, CLAUDE.md, site/docs.html, and the diff-review skill
  usage.md so the documented contract matches the validator.
mv -f atomically renames SENTINEL.tmp to SENTINEL on the success path, so
the temp file does not normally outlive the script — but if the launcher
or overlay terminal is killed mid-write, .tmp leaks in TMPDIR. add it to
each async branch's trap list so the cleanup is robust under signals.
both branches inline the inner script via sh -c instead of writing a
LAUNCH_SCRIPT, so the existing async-branch traps did not cover them.
add a per-branch trap so SENTINEL and SENTINEL.tmp clean up on signal.
Adds the --compare-old/--compare-new inline example and the full
"Two-File Diff" section that landed in the claude-plugin reference
when --compare flags shipped. The codex skills are a content-mirror
of the claude-plugin skills (copies, not symlinks); usage.md drifted
because earlier compare-flag updates only touched the claude-plugin
side.
@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented May 1, 2026

Pushed fixes addressing the review items:

  • Flag shape: replaced --compare=old:new with --compare-old=<path> --compare-new=<path>; killed the colon-split parser and resolveComparePaths — single parse+validate path, conflict checks before stat checks.
  • Argv hardening: inserted -- separator before paths in the git diff --no-index invocation.
  • countFileLines method-ized: now (r *CompareReader) countFileLines(), drops the redundant path param.
  • Tests: TestParseArgs_CompareConflicts now uses real t.TempDir() files; added subcases for paths-with-colons, binary files, large-file streaming (heap smoke-test), no-trailing-newline matrix, git stderr surfacing, symlinks (both + mixed), same-file.
  • --description kept allowed: doesn't conflict; added explicit "see comment" rationale in validator.
  • Docs: README, site/docs.html, CLAUDE.md, skill usage refs all updated to the two-flag form; added --annotations to the documented mutex list (validator already rejected it).

Added hook implementation — PR description is updated

Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

round-1 items addressed.

new findings:

  1. stale launcher override silent-failure. The README documents ${CLAUDE_PLUGIN_DATA}/scripts/launch-plan-review.sh as the supported customization path. Anyone who copied the bundled launcher there before this PR gets bitten silently:

    • pre-PR launcher: if [ $# -lt 1 ]; then exit 1; fi; PLAN_FILE="$1" accepts $# >= 1 and only reads $1
    • new hook in compare mode passes <old_snap> <new_snap> (plan-review-hook.py:180-183)
    • resolver picks the user override (user layer wins over bundled)
    • stale override accepts the call, ignores $2, opens revdiff against the OLD revision
    • user sees previous content (which they already reviewed), submits no annotations
    • hook reports "no annotations" -> plan accepted, user never saw the new revision

    the README explicitly tells users to copy the bundled launcher as a starting template, so this is a real upgrade hazard, not theoretical. Probably needs a fix before merge.

  2. TestParseArgs_CompareConflicts asserts NotEmpty on private compareAbsOld/compareAbsNew cache fields (app/compare_test.go:23-24). Confirms population but not correctness. Compare against expected filepath.Abs(oldFile) value, or drop and rely on end-to-end coverage in app/diff/compare_test.go.

discussion-only:

trusted_snapshot (plan-review-hook.py:47-66) gates by $TMPDIR + plan-rev- prefix but doesn't bind to per-chain state. A crafted or accidentally-pointing marker can resolve to another in-flight session's snapshot. That means parallel-session review against the wrong baseline, plus the post-success unlink breaks the other chain. Self-heals via --only fallback so practical impact is one round of focused-diff loss. Worth a sidecar nonce eventually if you ever see two sessions running at once in practice.

otherwise lgtm once the launcher hazard above is resolved.

…rompt

- swap compare-mode arg order from (old, new) to (new, old) in both the
  bundled launcher's 2-arg branch and the planning hook so that stale
  user-layer overrides at ${CLAUDE_PLUGIN_DATA}/scripts/ — copies of
  master's 1-arg launcher made before compare mode shipped — silently
  degrade to --only review of the NEW revision instead of opening the
  OLD one the user already reviewed. revdiff itself still receives
  --compare-old=<OLD> --compare-new=<NEW>; the launcher relabels locally.
- tighten the deny-reason text so the agent is explicitly told to ignore
  any other plan-rev-*.md marker it may have seen earlier in the
  conversation, addressing intra-session chain-mixup at the prompt layer.
- replace assert.NotEmpty on opts.compareAbsOld/New with equality against
  filepath.Abs(input) so the test would actually catch a swapped-fields
  bug rather than just confirming the cache is populated.
- document the new launcher contract (new-revision first) and the
  graceful-degradation rationale in the planning plugin README.
@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented May 1, 2026

addressed new findings

  • swap compare-mode arg order from (old, new) to (new, old) in launch-plan-review.sh it simple resolve for custom 1-arg launcher made before compare mode shipped - silently degrade to --only review of the NEW revision instead of opening the OLD one the user already reviewed.

  • trusted_snapshot: tighten the deny-reason text so the agent is explicitly told to ignore any other plan-rev-*.md marker it may have seen earlier in the conversation, addressing intra-session chain-mixup at the prompt layer.

Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thx

@umputun umputun merged commit 8c91398 into umputun:master May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants