Optimize fix_byte_order_marker by shifting file contents in place#1790
Optimize fix_byte_order_marker by shifting file contents in place#1790
fix_byte_order_marker by shifting file contents in place#1790Conversation
💡 Codex Reviewprek/crates/prek/src/hooks/pre_commit_hooks/fix_byte_order_marker.rs Lines 26 to 29 in 6ace0dc This now requires write permission on every candidate file before even checking the first 3 bytes, so a read-only file without a BOM will fail the hook with prek/crates/prek/src/cli/run/filter.rs Lines 137 to 140 in 6ace0dc Caching ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
This PR introduces several performance-oriented changes across prek’s built-in hooks and file filtering, including an in-place rewrite strategy when removing UTF-8 BOMs.
Changes:
- Optimize
fix_byte_order_markerto remove a UTF-8 BOM by shifting file contents in-place instead of rewriting via a full-buffer read. - Refactor
detect_private_keyto scan files incrementally using anaho-corasickmatcher (new dependency). - Cache
tags_from_pathresults in the CLI file filter to avoid recomputing tags for the same files.
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs | Refactors file sizing / scanning helpers and uses a fixed-size scan buffer. |
| crates/prek/src/hooks/pre_commit_hooks/fix_byte_order_marker.rs | Implements in-place shifting to remove BOM without allocating a full second file buffer. |
| crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs | Switches from whole-file reads to chunked scanning with aho-corasick + carry window. |
| crates/prek/src/cli/run/filter.rs | Adds a per-project cache of computed file tags to reduce repeated tag detection work. |
| crates/prek/Cargo.toml | Adds aho-corasick as a dependency for prek. |
| Cargo.toml | Adds aho-corasick to workspace dependencies. |
| Cargo.lock | Locks the new aho-corasick dependency. |
📦 Cargo Bloat ComparisonBinary size change: +0.00% (24.7 MiB → 24.7 MiB) Expand for cargo-bloat outputHead Branch ResultsBase Branch Results |
⚡️ Hyperfine BenchmarksSummary: 0 regressions, 0 improvements above the 10% threshold. Environment
CLI CommandsBenchmarking basic commands in the main repo:
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base --version |
2.4 ± 0.1 | 2.2 | 2.9 | 1.04 ± 0.07 |
prek-head --version |
2.3 ± 0.1 | 2.2 | 3.1 | 1.00 |
prek list
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base list |
9.0 ± 1.0 | 8.7 | 14.8 | 1.02 ± 0.11 |
prek-head list |
8.9 ± 0.1 | 8.7 | 9.3 | 1.00 |
prek validate-config .pre-commit-config.yaml
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base validate-config .pre-commit-config.yaml |
3.1 ± 0.1 | 3.0 | 3.3 | 1.01 ± 0.02 |
prek-head validate-config .pre-commit-config.yaml |
3.1 ± 0.0 | 3.0 | 3.2 | 1.00 |
prek sample-config
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base sample-config |
2.7 ± 0.0 | 2.6 | 2.8 | 1.01 ± 0.03 |
prek-head sample-config |
2.7 ± 0.1 | 2.6 | 2.8 | 1.00 |
Cold vs Warm Runs
Comparing first run (cold) vs subsequent runs (warm cache):
prek run --all-files (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
149.5 ± 2.4 | 146.4 | 153.2 | 1.00 |
prek-head run --all-files |
150.0 ± 1.8 | 147.8 | 153.5 | 1.00 ± 0.02 |
prek run --all-files (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
149.8 ± 1.3 | 148.0 | 152.3 | 1.00 |
prek-head run --all-files |
150.7 ± 2.7 | 146.7 | 155.6 | 1.01 ± 0.02 |
Full Hook Suite
Running the builtin hook suite on the benchmark workspace:
prek run --all-files (full builtin hook suite)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
150.5 ± 2.3 | 146.2 | 155.4 | 1.01 ± 0.02 |
prek-head run --all-files |
149.3 ± 2.2 | 144.7 | 154.8 | 1.00 |
Individual Hook Performance
Benchmarking each hook individually on the test repo:
prek run trailing-whitespace --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run trailing-whitespace --all-files |
30.0 ± 43.6 | 21.1 | 260.8 | 1.40 ± 2.03 |
prek-head run trailing-whitespace --all-files |
21.5 ± 0.4 | 21.0 | 22.2 | 1.00 |
prek run end-of-file-fixer --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run end-of-file-fixer --all-files |
27.0 ± 2.0 | 23.8 | 30.0 | 1.00 |
prek-head run end-of-file-fixer --all-files |
27.0 ± 2.0 | 24.4 | 30.9 | 1.00 ± 0.10 |
prek run check-json --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-json --all-files |
12.8 ± 0.2 | 12.2 | 13.2 | 1.03 ± 0.04 |
prek-head run check-json --all-files |
12.5 ± 0.4 | 11.5 | 13.3 | 1.00 |
prek run check-yaml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-yaml --all-files |
11.9 ± 0.3 | 11.5 | 13.2 | 1.01 ± 0.03 |
prek-head run check-yaml --all-files |
11.7 ± 0.1 | 11.6 | 12.0 | 1.00 |
prek run check-toml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-toml --all-files |
12.0 ± 0.2 | 11.5 | 12.5 | 1.00 |
prek-head run check-toml --all-files |
12.0 ± 0.2 | 11.7 | 12.7 | 1.00 ± 0.03 |
prek run check-xml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-xml --all-files |
12.0 ± 0.2 | 11.6 | 12.4 | 1.00 |
prek-head run check-xml --all-files |
12.1 ± 0.2 | 11.7 | 12.5 | 1.00 ± 0.03 |
prek run fix-byte-order-marker --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run fix-byte-order-marker --all-files |
20.4 ± 0.7 | 19.2 | 22.2 | 1.00 |
prek-head run fix-byte-order-marker --all-files |
20.6 ± 0.7 | 19.2 | 22.0 | 1.01 ± 0.05 |
Installation Performance
Benchmarking hook installation (fast path hooks skip Python setup):
prek install-hooks (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
4.8 ± 0.1 | 4.7 | 4.9 | 1.00 |
prek-head install-hooks |
4.8 ± 0.0 | 4.8 | 4.9 | 1.01 ± 0.01 |
prek install-hooks (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
4.8 ± 0.1 | 4.8 | 4.9 | 1.00 |
prek-head install-hooks |
4.8 ± 0.1 | 4.8 | 4.9 | 1.00 ± 0.02 |
File Filtering/Scoping Performance
Testing different file selection modes:
prek run (staged files only)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run |
52.6 ± 1.4 | 50.4 | 55.3 | 1.01 ± 0.03 |
prek-head run |
52.4 ± 1.1 | 50.8 | 54.7 | 1.00 |
prek run --files '*.json' (specific file type)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --files '*.json' |
9.1 ± 0.1 | 8.9 | 9.3 | 1.00 |
prek-head run --files '*.json' |
9.1 ± 0.1 | 8.9 | 9.2 | 1.00 ± 0.02 |
Workspace Discovery & Initialization
Benchmarking hook discovery and initialization overhead:
prek run --dry-run --all-files (measures init overhead)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --dry-run --all-files |
14.1 ± 0.3 | 13.8 | 15.0 | 1.00 ± 0.02 |
prek-head run --dry-run --all-files |
14.1 ± 0.1 | 13.8 | 14.3 | 1.00 |
Meta Hooks Performance
Benchmarking meta hooks separately:
prek run check-hooks-apply --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-hooks-apply --all-files |
14.2 ± 0.2 | 13.9 | 14.5 | 1.12 ± 0.02 |
prek-head run check-hooks-apply --all-files |
12.6 ± 0.1 | 12.5 | 12.8 | 1.00 |
prek run check-useless-excludes --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-useless-excludes --all-files |
12.5 ± 0.1 | 12.3 | 12.8 | 1.00 |
prek-head run check-useless-excludes --all-files |
12.5 ± 0.1 | 12.4 | 12.7 | 1.00 ± 0.01 |
prek run identity --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run identity --all-files |
11.0 ± 0.1 | 10.9 | 11.2 | 1.00 |
prek-head run identity --all-files |
11.0 ± 0.1 | 10.8 | 11.2 | 1.00 ± 0.01 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## optimize-fix-end-of-file #1790 +/- ##
============================================================
- Coverage 91.69% 89.58% -2.12%
============================================================
Files 98 98
Lines 19999 20019 +20
============================================================
- Hits 18339 17934 -405
- Misses 1660 2085 +425 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6ace0dc to
a3b3525
Compare
4c198bf to
b071af4
Compare
Shift file contents in place for BOM removal.