Optimize `fix_byte_order_marker` by shifting file contents in place by j178 · Pull Request #1790 · j178/prek

j178 · 2026-03-15T07:34:44Z

Shift file contents in place for BOM removal.

chatgpt-codex-connector · 2026-03-15T07:40:11Z

💡 Codex Review

prek/crates/prek/src/hooks/pre_commit_hooks/fix_byte_order_marker.rs

Lines 26 to 29 in 6ace0dc

    
           let mut file = fs_err::tokio::OpenOptions::new() 
        
               .read(true) 
        
               .write(true) 
        
               .open(&file_path)

Open files read-only until BOM removal is needed

This now requires write permission on every candidate file before even checking the first 3 bytes, so a read-only file without a BOM will fail the hook with EACCES instead of being reported as unchanged. The previous logic only needed write access when a BOM was actually present, so this is a behavior regression for repositories that contain non-writable tracked files.

prek/crates/prek/src/cli/run/filter.rs

Lines 137 to 140 in 6ace0dc

    
           let file_tags = filenames 
        
               .par_iter() 
        
               .filter_map(|filename| match tags_from_path(filename) { 
        
                   Ok(tags) => Some((*filename, tags)),

Recompute tags for each hook invocation

Caching tags_from_path once at project-filter construction makes hook file selection stale after earlier hooks modify files, even though run_hook calls filter.for_hook repeatedly across the run. Because tags_from_path depends on file metadata/content (e.g., executable bit, shebang, text/binary detection), a fixer that changes those attributes can cause later hooks with types/exclude_types to be incorrectly skipped or included.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR introduces several performance-oriented changes across prek’s built-in hooks and file filtering, including an in-place rewrite strategy when removing UTF-8 BOMs.

Changes:

Optimize fix_byte_order_marker to remove a UTF-8 BOM by shifting file contents in-place instead of rewriting via a full-buffer read.
Refactor detect_private_key to scan files incrementally using an aho-corasick matcher (new dependency).
Cache tags_from_path results in the CLI file filter to avoid recomputing tags for the same files.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs	Refactors file sizing / scanning helpers and uses a fixed-size scan buffer.
crates/prek/src/hooks/pre_commit_hooks/fix_byte_order_marker.rs	Implements in-place shifting to remove BOM without allocating a full second file buffer.
crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs	Switches from whole-file reads to chunked scanning with `aho-corasick` + carry window.
crates/prek/src/cli/run/filter.rs	Adds a per-project cache of computed file tags to reduce repeated tag detection work.
crates/prek/Cargo.toml	Adds `aho-corasick` as a dependency for `prek`.
Cargo.toml	Adds `aho-corasick` to workspace dependencies.
Cargo.lock	Locks the new `aho-corasick` dependency.

prek-ci-bot · 2026-03-15T07:55:40Z

📦 Cargo Bloat Comparison

Binary size change: +0.00% (24.7 MiB → 24.7 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.7%  81.7KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.1KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.8KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.0%  85.8%  10.2MiB                   And 23278 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.7MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  68.3KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.4KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.9KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.1%  85.9%  10.2MiB                   And 23264 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.7MiB

prek-ci-bot · 2026-03-15T07:55:40Z

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment

OS: Linux 6.14.0-1017-azure
CPU: 4 cores
prek version: prek 0.3.5+20 (e2a073d 2026-03-15)
Rust version: rustc 1.94.0 (4a4ef493e 2026-03-02)
Hyperfine version: hyperfine 1.20.0

CLI Commands

Benchmarking basic commands in the main repo:

`prek --version`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base --version`	2.4 ± 0.1	2.2	2.9	1.04 ± 0.07
`prek-head --version`	2.3 ± 0.1	2.2	3.1	1.00

`prek list`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base list`	9.0 ± 1.0	8.7	14.8	1.02 ± 0.11
`prek-head list`	8.9 ± 0.1	8.7	9.3	1.00

`prek validate-config .pre-commit-config.yaml`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base validate-config .pre-commit-config.yaml`	3.1 ± 0.1	3.0	3.3	1.01 ± 0.02
`prek-head validate-config .pre-commit-config.yaml`	3.1 ± 0.0	3.0	3.2	1.00

`prek sample-config`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base sample-config`	2.7 ± 0.0	2.6	2.8	1.01 ± 0.03
`prek-head sample-config`	2.7 ± 0.1	2.6	2.8	1.00

Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

`prek run --all-files (cold - no cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	149.5 ± 2.4	146.4	153.2	1.00
`prek-head run --all-files`	150.0 ± 1.8	147.8	153.5	1.00 ± 0.02

`prek run --all-files (warm - with cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	149.8 ± 1.3	148.0	152.3	1.00
`prek-head run --all-files`	150.7 ± 2.7	146.7	155.6	1.01 ± 0.02

Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

`prek run --all-files (full builtin hook suite)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	150.5 ± 2.3	146.2	155.4	1.01 ± 0.02
`prek-head run --all-files`	149.3 ± 2.2	144.7	154.8	1.00

Individual Hook Performance

Benchmarking each hook individually on the test repo:

`prek run trailing-whitespace --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run trailing-whitespace --all-files`	30.0 ± 43.6	21.1	260.8	1.40 ± 2.03
`prek-head run trailing-whitespace --all-files`	21.5 ± 0.4	21.0	22.2	1.00

`prek run end-of-file-fixer --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run end-of-file-fixer --all-files`	27.0 ± 2.0	23.8	30.0	1.00
`prek-head run end-of-file-fixer --all-files`	27.0 ± 2.0	24.4	30.9	1.00 ± 0.10

`prek run check-json --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-json --all-files`	12.8 ± 0.2	12.2	13.2	1.03 ± 0.04
`prek-head run check-json --all-files`	12.5 ± 0.4	11.5	13.3	1.00

`prek run check-yaml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-yaml --all-files`	11.9 ± 0.3	11.5	13.2	1.01 ± 0.03
`prek-head run check-yaml --all-files`	11.7 ± 0.1	11.6	12.0	1.00

`prek run check-toml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-toml --all-files`	12.0 ± 0.2	11.5	12.5	1.00
`prek-head run check-toml --all-files`	12.0 ± 0.2	11.7	12.7	1.00 ± 0.03

`prek run check-xml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-xml --all-files`	12.0 ± 0.2	11.6	12.4	1.00
`prek-head run check-xml --all-files`	12.1 ± 0.2	11.7	12.5	1.00 ± 0.03

`prek run fix-byte-order-marker --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run fix-byte-order-marker --all-files`	20.4 ± 0.7	19.2	22.2	1.00
`prek-head run fix-byte-order-marker --all-files`	20.6 ± 0.7	19.2	22.0	1.01 ± 0.05

Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

`prek install-hooks (cold - no cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base install-hooks`	4.8 ± 0.1	4.7	4.9	1.00
`prek-head install-hooks`	4.8 ± 0.0	4.8	4.9	1.01 ± 0.01

`prek install-hooks (warm - with cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base install-hooks`	4.8 ± 0.1	4.8	4.9	1.00
`prek-head install-hooks`	4.8 ± 0.1	4.8	4.9	1.00 ± 0.02

File Filtering/Scoping Performance

Testing different file selection modes:

`prek run (staged files only)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run`	52.6 ± 1.4	50.4	55.3	1.01 ± 0.03
`prek-head run`	52.4 ± 1.1	50.8	54.7	1.00

`prek run --files '*.json' (specific file type)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --files '*.json'`	9.1 ± 0.1	8.9	9.3	1.00
`prek-head run --files '*.json'`	9.1 ± 0.1	8.9	9.2	1.00 ± 0.02

Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

`prek run --dry-run --all-files (measures init overhead)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --dry-run --all-files`	14.1 ± 0.3	13.8	15.0	1.00 ± 0.02
`prek-head run --dry-run --all-files`	14.1 ± 0.1	13.8	14.3	1.00

Meta Hooks Performance

Benchmarking meta hooks separately:

`prek run check-hooks-apply --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-hooks-apply --all-files`	14.2 ± 0.2	13.9	14.5	1.12 ± 0.02
`prek-head run check-hooks-apply --all-files`	12.6 ± 0.1	12.5	12.8	1.00

`prek run check-useless-excludes --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-useless-excludes --all-files`	12.5 ± 0.1	12.3	12.8	1.00
`prek-head run check-useless-excludes --all-files`	12.5 ± 0.1	12.4	12.7	1.00 ± 0.01

`prek run identity --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run identity --all-files`	11.0 ± 0.1	10.9	11.2	1.00
`prek-head run identity --all-files`	11.0 ± 0.1	10.8	11.2	1.00 ± 0.01

codecov · 2026-03-15T07:56:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.58%. Comparing base (2c53121) to head (a3b3525).
⚠️ Report is 1 commits behind head on optimize-fix-end-of-file.

Additional details and impacted files

@@                     Coverage Diff                      @@
##           optimize-fix-end-of-file    #1790      +/-   ##
============================================================
- Coverage                     91.69%   89.58%   -2.12%     
============================================================
  Files                            98       98              
  Lines                         19999    20019      +20     
============================================================
- Hits                          18339    17934     -405     
- Misses                         1660     2085     +425

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

j178 added the performance Performance improvements label Mar 15, 2026

Copilot AI review requested due to automatic review settings March 15, 2026 07:34

Copilot started reviewing on behalf of j178 March 15, 2026 07:36 View session

j178 changed the base branch from master to optimize-fix-end-of-file March 15, 2026 07:37

Copilot AI reviewed Mar 15, 2026

View reviewed changes

j178 added 2 commits March 15, 2026 16:58

Optimize fix_byte_order_marker by shifting file contents in place

992068b

Benchmark fix-byte-order-marker in hyperfine script

a3b3525

j178 force-pushed the optimize-fix-byte-order-marker branch from 6ace0dc to a3b3525 Compare March 15, 2026 09:01

j178 force-pushed the optimize-fix-end-of-file branch from 4c198bf to b071af4 Compare March 15, 2026 09:01

Base automatically changed from optimize-fix-end-of-file to optimize-detect-private-key March 15, 2026 09:22

Tweak

0936d76

j178 changed the base branch from optimize-detect-private-key to master March 15, 2026 09:25

j178 merged commit d6f6e43 into master Mar 15, 2026

j178 deleted the optimize-fix-byte-order-marker branch March 15, 2026 09:25

BrewTestBot mentioned this pull request Mar 16, 2026

prek 0.3.6 Homebrew/homebrew-core#272532

Merged

Conversation

j178 commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 15, 2026

💡 Codex Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

prek-ci-bot bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Cargo Bloat Comparison

Head Branch Results

Base Branch Results

Uh oh!

prek-ci-bot bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡️ Hyperfine Benchmarks

prek --version

prek list

prek validate-config .pre-commit-config.yaml

prek sample-config

prek run --all-files (cold - no cache)

prek run --all-files (warm - with cache)

prek run --all-files (full builtin hook suite)

prek run trailing-whitespace --all-files

prek run end-of-file-fixer --all-files

prek run check-json --all-files

prek run check-yaml --all-files

prek run check-toml --all-files

prek run check-xml --all-files

prek run fix-byte-order-marker --all-files

prek install-hooks (cold - no cache)

prek install-hooks (warm - with cache)

prek run (staged files only)

prek run --files '*.json' (specific file type)

prek run --dry-run --all-files (measures init overhead)

prek run check-hooks-apply --all-files

prek run check-useless-excludes --all-files

prek run identity --all-files

Uh oh!

codecov bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prek-ci-bot bot commented Mar 15, 2026 •

edited

Loading

prek-ci-bot bot commented Mar 15, 2026 •

edited

Loading

`prek --version`

`prek list`

`prek validate-config .pre-commit-config.yaml`

`prek sample-config`

`prek run --all-files (cold - no cache)`

`prek run --all-files (warm - with cache)`

`prek run --all-files (full builtin hook suite)`

`prek run trailing-whitespace --all-files`

`prek run end-of-file-fixer --all-files`

`prek run check-json --all-files`

`prek run check-yaml --all-files`

`prek run check-toml --all-files`

`prek run check-xml --all-files`

`prek run fix-byte-order-marker --all-files`

`prek install-hooks (cold - no cache)`

`prek install-hooks (warm - with cache)`

`prek run (staged files only)`

`prek run --files '*.json' (specific file type)`

`prek run --dry-run --all-files (measures init overhead)`

`prek run check-hooks-apply --all-files`

`prek run check-useless-excludes --all-files`

`prek run identity --all-files`

codecov bot commented Mar 15, 2026 •

edited

Loading