In prek 0.4.2, a hook that rewrites a file in place - even with byte-identical content - can be reported as having modified files. This affects every mode that uses the "clean worktree" fast path (plain prek run, --last-commit, --from-ref/--to-ref); only --all-files is always safe. prek 0.4.1 handles the same input correctly.
I'm not sure how many hooks are affected by this, this regression seems to be quite specific for formatters that always rewrite the contents (all major formatter hooks are safe) - we're using one that started to fail after release.
Reproducer
mkdir repro && cd repro && git init -q -b main
git config user.email t@t && git config user.name t
cat > .pre-commit-config.yaml <<'EOF'
repos:
- repo: local
hooks:
- id: stat-touch
name: rewrite identical content (bumps mtime)
entry: bash -euc 'for f in "$@"; do c=$(cat "$f"); printf "%s" "$c" > "$f"; done' --
language: system
files: \.txt$
EOF
printf "old" > a.txt && git add . && git commit -qm initial
git checkout -qb feature
printf "new" > a.txt && git add . && git commit -qm change
md5sum a.txt
uvx --from prek==0.4.2 prek run --from-ref main --to-ref feature --verbose
md5sum a.txt # identical
22af645d1859cb5ca6da0c484f1f37ea a.txt
rewrite identical content (bumps mtime)..................................Failed
- hook id: stat-touch
- files were modified by this hook
22af645d1859cb5ca6da0c484f1f37ea a.txt
0.4.1 reports Passed on the same setup.
| prek |
--from-ref/--to-ref |
--all-files |
| 0.4.1 |
Passed |
Passed |
| 0.4.2 |
Failed (false positive) |
Passed |
Root cause
PR #2109 ("Optimize diff checks for clean worktrees") added DiffTracker with a Clean baseline that uses has_worktree_diff as the post-hook check:
// crates/prek/src/cli/run/diff.rs
DiffBaseline::Clean => {
if !git::has_worktree_diff(self.path).await? {
return Ok(false);
}
self.baseline = DiffBaseline::Snapshot(git::get_diff(self.path).await?);
Ok(true) // reached on any stat change, content not checked
}
has_worktree_diff runs git diff-files --quiet, which is stat-cache based:
$ echo hello > a && git add a && git commit -qm i
$ git diff-files --quiet; echo $? # 0
$ cat a > /tmp/x && cat /tmp/x > a # rewrite, identical content
$ git diff-files --quiet; echo $? # 1 <- false positive
$ git diff --quiet; echo $? # 0 (content-aware)
Any in-place rewrite - which is what some formatters do, even on a no-op pass - bumps mtime, trips --quiet, and the Clean arm returns Ok(true) without ever checking content.
The Clean baseline is selected whenever should_stash = !all_files && files.is_empty() && directories.is_empty() -> worktree_cleaned = true -> clean_baseline = true. That is true for plain prek run, --last-commit, and --from-ref/--to-ref alike (confirmed reproducing on --last-commit as well). Only --all-files takes the unknown_baseline path, which content-diffs via get_diff before/after, so stat-only changes wash out there.
PR #2109's comment message calls out the analogous concern for CRLF (diff-files --quiet false positives on CRLF-normalized worktrees), but the same mechanism produces false positives for any plain rewrite.
Suggested fix
In the DiffBaseline::Clean arm, fall through to get_diff() when has_worktree_diff returns true, and only report Ok(true) if the content-aware diff is non-empty. Keeps the fast path for hooks that genuinely don't touch files; only pays the content-diff cost on a stat-change (rare in steady state).
// crates/prek/src/cli/run/diff.rs
DiffBaseline::Clean => {
if !git::has_worktree_diff(self.path).await? {
return Ok(false);
}
// `git diff-files --quiet` is stat-based and reports dirty for any in-place rewrite
// (mtime bump) even when content is unchanged
let curr_diff = git::get_diff(self.path).await?;
if curr_diff.is_empty() {
return Ok(false);
}
self.baseline = DiffBaseline::Snapshot(curr_diff);
Ok(true)
}
This change was tested to work in the same environment.
Environment
- prek 0.4.2 (PyPI via uvx), Ubuntu 24.04. Reproducible with the snippet above on a fresh checkout.
edit: Added suggested fix section (not confident enough to create a PR as I never worked with Rust, I'll let a professional do it)
In prek 0.4.2, a hook that rewrites a file in place - even with byte-identical content - can be reported as having modified files. This affects every mode that uses the "clean worktree" fast path (plain
prek run,--last-commit,--from-ref/--to-ref); only--all-filesis always safe. prek 0.4.1 handles the same input correctly.I'm not sure how many hooks are affected by this, this regression seems to be quite specific for formatters that always rewrite the contents (all major formatter hooks are safe) - we're using one that started to fail after release.
Reproducer
0.4.1 reports
Passedon the same setup.--from-ref/--to-ref--all-filesRoot cause
PR #2109 ("Optimize diff checks for clean worktrees") added
DiffTrackerwith aCleanbaseline that useshas_worktree_diffas the post-hook check:has_worktree_diffrunsgit diff-files --quiet, which is stat-cache based:Any in-place rewrite - which is what some formatters do, even on a no-op pass - bumps
mtime, trips--quiet, and theCleanarm returnsOk(true)without ever checking content.The
Cleanbaseline is selected whenevershould_stash = !all_files && files.is_empty() && directories.is_empty()->worktree_cleaned = true->clean_baseline = true. That is true for plainprek run,--last-commit, and--from-ref/--to-refalike (confirmed reproducing on--last-commitas well). Only--all-filestakes theunknown_baselinepath, which content-diffs viaget_diffbefore/after, so stat-only changes wash out there.PR #2109's comment message calls out the analogous concern for CRLF (
diff-files --quietfalse positives on CRLF-normalized worktrees), but the same mechanism produces false positives for any plain rewrite.Suggested fix
In the
DiffBaseline::Cleanarm, fall through toget_diff()whenhas_worktree_diffreturns true, and only reportOk(true)if the content-aware diff is non-empty. Keeps the fast path for hooks that genuinely don't touch files; only pays the content-diff cost on a stat-change (rare in steady state).This change was tested to work in the same environment.
Environment
edit: Added suggested fix section (not confident enough to create a PR as I never worked with Rust, I'll let a professional do it)