Skip to content

Optimize fix_byte_order_marker by shifting file contents in place#1790

Merged
j178 merged 3 commits intomasterfrom
optimize-fix-byte-order-marker
Mar 15, 2026
Merged

Optimize fix_byte_order_marker by shifting file contents in place#1790
j178 merged 3 commits intomasterfrom
optimize-fix-byte-order-marker

Conversation

@j178
Copy link
Copy Markdown
Owner

@j178 j178 commented Mar 15, 2026

Shift file contents in place for BOM removal.

@j178 j178 added the performance Performance improvements label Mar 15, 2026
Copilot AI review requested due to automatic review settings March 15, 2026 07:34
@j178 j178 changed the base branch from master to optimize-fix-end-of-file March 15, 2026 07:37
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

let mut file = fs_err::tokio::OpenOptions::new()
.read(true)
.write(true)
.open(&file_path)

P2 Badge Open files read-only until BOM removal is needed

This now requires write permission on every candidate file before even checking the first 3 bytes, so a read-only file without a BOM will fail the hook with EACCES instead of being reported as unchanged. The previous logic only needed write access when a BOM was actually present, so this is a behavior regression for repositories that contain non-writable tracked files.


let file_tags = filenames
.par_iter()
.filter_map(|filename| match tags_from_path(filename) {
Ok(tags) => Some((*filename, tags)),

P1 Badge Recompute tags for each hook invocation

Caching tags_from_path once at project-filter construction makes hook file selection stale after earlier hooks modify files, even though run_hook calls filter.for_hook repeatedly across the run. Because tags_from_path depends on file metadata/content (e.g., executable bit, shebang, text/binary detection), a fixer that changes those attributes can cause later hooks with types/exclude_types to be incorrectly skipped or included.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces several performance-oriented changes across prek’s built-in hooks and file filtering, including an in-place rewrite strategy when removing UTF-8 BOMs.

Changes:

  • Optimize fix_byte_order_marker to remove a UTF-8 BOM by shifting file contents in-place instead of rewriting via a full-buffer read.
  • Refactor detect_private_key to scan files incrementally using an aho-corasick matcher (new dependency).
  • Cache tags_from_path results in the CLI file filter to avoid recomputing tags for the same files.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs Refactors file sizing / scanning helpers and uses a fixed-size scan buffer.
crates/prek/src/hooks/pre_commit_hooks/fix_byte_order_marker.rs Implements in-place shifting to remove BOM without allocating a full second file buffer.
crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs Switches from whole-file reads to chunked scanning with aho-corasick + carry window.
crates/prek/src/cli/run/filter.rs Adds a per-project cache of computed file tags to reduce repeated tag detection work.
crates/prek/Cargo.toml Adds aho-corasick as a dependency for prek.
Cargo.toml Adds aho-corasick to workspace dependencies.
Cargo.lock Locks the new aho-corasick dependency.

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 15, 2026

📦 Cargo Bloat Comparison

Binary size change: +0.00% (24.7 MiB → 24.7 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.7%  81.7KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.1KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.8KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.0%  85.8%  10.2MiB                   And 23278 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.7MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  68.3KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.4KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.9KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.1%  85.9%  10.2MiB                   And 23264 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.7MiB

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 15, 2026

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment
  • OS: Linux 6.14.0-1017-azure
  • CPU: 4 cores
  • prek version: prek 0.3.5+20 (e2a073d 2026-03-15)
  • Rust version: rustc 1.94.0 (4a4ef493e 2026-03-02)
  • Hyperfine version: hyperfine 1.20.0
CLI Commands

Benchmarking basic commands in the main repo:

prek --version

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base --version 2.4 ± 0.1 2.2 2.9 1.04 ± 0.07
prek-head --version 2.3 ± 0.1 2.2 3.1 1.00

prek list

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base list 9.0 ± 1.0 8.7 14.8 1.02 ± 0.11
prek-head list 8.9 ± 0.1 8.7 9.3 1.00

prek validate-config .pre-commit-config.yaml

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base validate-config .pre-commit-config.yaml 3.1 ± 0.1 3.0 3.3 1.01 ± 0.02
prek-head validate-config .pre-commit-config.yaml 3.1 ± 0.0 3.0 3.2 1.00

prek sample-config

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base sample-config 2.7 ± 0.0 2.6 2.8 1.01 ± 0.03
prek-head sample-config 2.7 ± 0.1 2.6 2.8 1.00
Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

prek run --all-files (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 149.5 ± 2.4 146.4 153.2 1.00
prek-head run --all-files 150.0 ± 1.8 147.8 153.5 1.00 ± 0.02

prek run --all-files (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 149.8 ± 1.3 148.0 152.3 1.00
prek-head run --all-files 150.7 ± 2.7 146.7 155.6 1.01 ± 0.02
Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

prek run --all-files (full builtin hook suite)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 150.5 ± 2.3 146.2 155.4 1.01 ± 0.02
prek-head run --all-files 149.3 ± 2.2 144.7 154.8 1.00
Individual Hook Performance

Benchmarking each hook individually on the test repo:

prek run trailing-whitespace --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run trailing-whitespace --all-files 30.0 ± 43.6 21.1 260.8 1.40 ± 2.03
prek-head run trailing-whitespace --all-files 21.5 ± 0.4 21.0 22.2 1.00

prek run end-of-file-fixer --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run end-of-file-fixer --all-files 27.0 ± 2.0 23.8 30.0 1.00
prek-head run end-of-file-fixer --all-files 27.0 ± 2.0 24.4 30.9 1.00 ± 0.10

prek run check-json --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-json --all-files 12.8 ± 0.2 12.2 13.2 1.03 ± 0.04
prek-head run check-json --all-files 12.5 ± 0.4 11.5 13.3 1.00

prek run check-yaml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-yaml --all-files 11.9 ± 0.3 11.5 13.2 1.01 ± 0.03
prek-head run check-yaml --all-files 11.7 ± 0.1 11.6 12.0 1.00

prek run check-toml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-toml --all-files 12.0 ± 0.2 11.5 12.5 1.00
prek-head run check-toml --all-files 12.0 ± 0.2 11.7 12.7 1.00 ± 0.03

prek run check-xml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-xml --all-files 12.0 ± 0.2 11.6 12.4 1.00
prek-head run check-xml --all-files 12.1 ± 0.2 11.7 12.5 1.00 ± 0.03

prek run fix-byte-order-marker --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run fix-byte-order-marker --all-files 20.4 ± 0.7 19.2 22.2 1.00
prek-head run fix-byte-order-marker --all-files 20.6 ± 0.7 19.2 22.0 1.01 ± 0.05
Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

prek install-hooks (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 4.8 ± 0.1 4.7 4.9 1.00
prek-head install-hooks 4.8 ± 0.0 4.8 4.9 1.01 ± 0.01

prek install-hooks (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 4.8 ± 0.1 4.8 4.9 1.00
prek-head install-hooks 4.8 ± 0.1 4.8 4.9 1.00 ± 0.02
File Filtering/Scoping Performance

Testing different file selection modes:

prek run (staged files only)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run 52.6 ± 1.4 50.4 55.3 1.01 ± 0.03
prek-head run 52.4 ± 1.1 50.8 54.7 1.00

prek run --files '*.json' (specific file type)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --files '*.json' 9.1 ± 0.1 8.9 9.3 1.00
prek-head run --files '*.json' 9.1 ± 0.1 8.9 9.2 1.00 ± 0.02
Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

prek run --dry-run --all-files (measures init overhead)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --dry-run --all-files 14.1 ± 0.3 13.8 15.0 1.00 ± 0.02
prek-head run --dry-run --all-files 14.1 ± 0.1 13.8 14.3 1.00
Meta Hooks Performance

Benchmarking meta hooks separately:

prek run check-hooks-apply --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-hooks-apply --all-files 14.2 ± 0.2 13.9 14.5 1.12 ± 0.02
prek-head run check-hooks-apply --all-files 12.6 ± 0.1 12.5 12.8 1.00

prek run check-useless-excludes --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-useless-excludes --all-files 12.5 ± 0.1 12.3 12.8 1.00
prek-head run check-useless-excludes --all-files 12.5 ± 0.1 12.4 12.7 1.00 ± 0.01

prek run identity --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run identity --all-files 11.0 ± 0.1 10.9 11.2 1.00
prek-head run identity --all-files 11.0 ± 0.1 10.8 11.2 1.00 ± 0.01

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.58%. Comparing base (2c53121) to head (a3b3525).
⚠️ Report is 1 commits behind head on optimize-fix-end-of-file.

Additional details and impacted files
@@                     Coverage Diff                      @@
##           optimize-fix-end-of-file    #1790      +/-   ##
============================================================
- Coverage                     91.69%   89.58%   -2.12%     
============================================================
  Files                            98       98              
  Lines                         19999    20019      +20     
============================================================
- Hits                          18339    17934     -405     
- Misses                         1660     2085     +425     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@j178 j178 force-pushed the optimize-fix-byte-order-marker branch from 6ace0dc to a3b3525 Compare March 15, 2026 09:01
@j178 j178 force-pushed the optimize-fix-end-of-file branch from 4c198bf to b071af4 Compare March 15, 2026 09:01
Base automatically changed from optimize-fix-end-of-file to optimize-detect-private-key March 15, 2026 09:22
@j178 j178 changed the base branch from optimize-detect-private-key to master March 15, 2026 09:25
@j178 j178 merged commit d6f6e43 into master Mar 15, 2026
@j178 j178 deleted the optimize-fix-byte-order-marker branch March 15, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants