Skip to content

Optimize fix_end_of_file by reducing file operations#1792

Merged
j178 merged 1 commit intooptimize-detect-private-keyfrom
optimize-fix-end-of-file
Mar 15, 2026
Merged

Optimize fix_end_of_file by reducing file operations#1792
j178 merged 1 commit intooptimize-detect-private-keyfrom
optimize-fix-end-of-file

Conversation

@j178
Copy link
Copy Markdown
Owner

@j178 j178 commented Mar 15, 2026

Reduce file operations in fix_end_of_file.

@j178 j178 added the performance Performance improvements label Mar 15, 2026
Copilot AI review requested due to automatic review settings March 15, 2026 07:34
@j178 j178 changed the base branch from master to optimize-detect-private-key March 15, 2026 07:37
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

let file_tags = filenames
.par_iter()
.filter_map(|filename| match tags_from_path(filename) {
Ok(tags) => Some((*filename, tags)),
Err(err) => {

P2 Badge Recompute file tags per hook invocation

Building file_tags once in for_project makes type filtering use a stale snapshot for the rest of the project run, but hooks are executed sequentially and can modify or remove files between invocations. That means later hooks may be incorrectly skipped (e.g., mode/content changes should alter tags) or run on paths that no longer exist, whereas the previous per-hook tags_from_path calls reflected the current filesystem state each time.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces filesystem and CPU overhead in several hooks/CLI paths by avoiding redundant reads/seeks and reusing computed metadata (file tags and key-match automaton) across filtering/scanning operations.

Changes:

  • fix_end_of_file: compute file size via seek and avoid extra flush/shutdown calls; pass file length into the backwards scanner.
  • detect_private_key: switch from full-file read + per-pattern search to streaming chunk scan using a shared Aho–Corasick matcher.
  • FileFilter: cache tags_from_path results per file to avoid recomputation during subsequent filtering.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs Reduces file operations by reusing a single EOF seek and stack buffer during scan.
crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs Streams files in chunks and uses a shared Aho–Corasick matcher for faster key detection.
crates/prek/src/cli/run/filter.rs Caches file tags in FileFilter to avoid repeated tags_from_path calls.
crates/prek/Cargo.toml Adds aho-corasick dependency to the prek crate.
Cargo.toml Adds aho-corasick to workspace dependencies.
Cargo.lock Locks the new aho-corasick dependency.
Comments suppressed due to low confidence (1)

crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs:94

  • find_last_non_ending repeatedly seeks back by block_size from the current cursor position, but after read_exact the cursor is back at EOF, so the loop re-reads the same trailing block instead of scanning earlier blocks. This can mis-handle files with >4KB of trailing newlines (e.g., incorrectly treating them as all line endings). Consider seeking to an absolute offset based on data_len and read_len (e.g., SeekFrom::Start(data_len - read_len - block_size as u64)) so each iteration reads the next block moving backwards; this also lets you drop the unwrap() on the i64 conversion.
    while read_len < data_len {
        let block_size = MAX_SCAN_SIZE.min(usize::try_from(data_len - read_len)?);
        // SAFETY: block_size is guaranteed to be less than or equal to MAX_SCAN_SIZE
        reader
            .seek(SeekFrom::Current(-i64::try_from(block_size).unwrap()))
            .await?;
        reader.read_exact(&mut buf[..block_size]).await?;

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.70%. Comparing base (2c53121) to head (b071af4).

Additional details and impacted files
@@                     Coverage Diff                      @@
##           optimize-detect-private-key    #1792   +/-   ##
============================================================
  Coverage                        91.69%   91.70%           
============================================================
  Files                               98       98           
  Lines                            19999    20011   +12     
============================================================
+ Hits                             18339    18352   +13     
+ Misses                            1660     1659    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 15, 2026

📦 Cargo Bloat Comparison

Binary size change: -0.40% (24.8 MiB → 24.7 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.7%  81.0KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6%  77.1KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.1KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.8KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  31.8KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  22.9KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.0%  85.8%  10.1MiB                   And 23215 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.7MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.7%  81.7KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.2KiB              prek prek::run::{{closure}}
 0.2%   0.3%  41.8KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.0%  85.8%  10.2MiB                   And 23293 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.8MiB

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 15, 2026

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment
  • OS: Linux 6.14.0-1017-azure
  • CPU: 4 cores
  • prek version: prek 0.3.5+20 (e7180e4 2026-03-15)
  • Rust version: rustc 1.94.0 (4a4ef493e 2026-03-02)
  • Hyperfine version: hyperfine 1.20.0
CLI Commands

Benchmarking basic commands in the main repo:

prek --version

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base --version 2.4 ± 0.2 2.3 3.8 1.03 ± 0.08
prek-head --version 2.4 ± 0.1 2.3 2.7 1.00

prek list

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base list 9.1 ± 0.3 8.8 10.5 1.00
prek-head list 9.1 ± 0.3 8.7 11.0 1.00 ± 0.05

prek validate-config .pre-commit-config.yaml

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base validate-config .pre-commit-config.yaml 3.2 ± 0.1 3.1 3.3 1.00 ± 0.03
prek-head validate-config .pre-commit-config.yaml 3.1 ± 0.1 3.1 3.3 1.00

prek sample-config

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base sample-config 2.7 ± 0.1 2.6 2.9 1.00
prek-head sample-config 2.7 ± 0.0 2.6 2.8 1.00 ± 0.03
Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

prek run --all-files (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 155.8 ± 3.1 151.2 160.0 1.00
prek-head run --all-files 157.3 ± 7.3 151.9 177.4 1.01 ± 0.05

prek run --all-files (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 157.4 ± 3.6 153.3 168.6 1.00 ± 0.05
prek-head run --all-files 156.9 ± 6.7 151.4 182.8 1.00
Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

prek run --all-files (full builtin hook suite)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 158.6 ± 2.8 152.0 164.8 1.00
prek-head run --all-files 159.6 ± 20.9 150.7 290.5 1.01 ± 0.13
Individual Hook Performance

Benchmarking each hook individually on the test repo:

prek run trailing-whitespace --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run trailing-whitespace --all-files 22.7 ± 0.4 21.9 23.7 1.00
prek-head run trailing-whitespace --all-files 22.7 ± 0.8 21.8 25.3 1.00 ± 0.04

prek run end-of-file-fixer --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run end-of-file-fixer --all-files 29.7 ± 1.9 26.5 33.2 1.06 ± 0.10
prek-head run end-of-file-fixer --all-files 28.0 ± 1.9 24.4 31.7 1.00

prek run check-json --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-json --all-files 13.1 ± 0.3 12.7 14.0 1.06 ± 0.03
prek-head run check-json --all-files 12.3 ± 0.3 11.9 12.8 1.00

prek run check-yaml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-yaml --all-files 12.0 ± 0.3 11.7 12.8 1.00
prek-head run check-yaml --all-files 12.3 ± 0.4 11.7 13.4 1.02 ± 0.04

prek run check-toml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-toml --all-files 12.3 ± 0.2 12.0 12.9 1.01 ± 0.03
prek-head run check-toml --all-files 12.3 ± 0.3 11.7 13.0 1.00

prek run check-xml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-xml --all-files 12.2 ± 0.3 11.7 12.8 1.00
prek-head run check-xml --all-files 12.6 ± 0.5 11.8 14.7 1.03 ± 0.05

prek run detect-private-key --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run detect-private-key --all-files 19.0 ± 1.6 16.7 22.4 1.02 ± 0.10
prek-head run detect-private-key --all-files 18.6 ± 1.1 16.7 20.7 1.00
Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

prek install-hooks (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 4.9 ± 0.0 4.9 5.0 1.01 ± 0.01
prek-head install-hooks 4.9 ± 0.0 4.8 4.9 1.00

prek install-hooks (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 4.9 ± 0.1 4.8 5.1 1.00
prek-head install-hooks 5.0 ± 0.1 4.9 5.1 1.02 ± 0.03
File Filtering/Scoping Performance

Testing different file selection modes:

prek run (staged files only)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run 19.0 ± 0.2 18.8 19.5 1.00 ± 0.01
prek-head run 19.0 ± 0.2 18.6 19.5 1.00

prek run --files '*.json' (specific file type)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --files '*.json' 7.8 ± 0.8 7.4 11.4 1.03 ± 0.11
prek-head run --files '*.json' 7.6 ± 0.1 7.4 7.8 1.00
Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

prek run --dry-run --all-files (measures init overhead)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --dry-run --all-files 12.5 ± 0.2 12.3 13.0 1.00
prek-head run --dry-run --all-files 12.7 ± 0.7 12.3 15.8 1.01 ± 0.06
Meta Hooks Performance

Benchmarking meta hooks separately:

prek run check-hooks-apply --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-hooks-apply --all-files 14.3 ± 0.3 13.9 15.1 1.00
prek-head run check-hooks-apply --all-files 14.3 ± 0.1 14.1 14.5 1.00 ± 0.02

prek run check-useless-excludes --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-useless-excludes --all-files 13.9 ± 0.7 12.4 14.6 1.09 ± 0.06
prek-head run check-useless-excludes --all-files 12.7 ± 0.1 12.6 13.1 1.00

prek run identity --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run identity --all-files 11.2 ± 0.1 11.1 11.3 1.00
prek-head run identity --all-files 11.3 ± 0.2 11.1 11.6 1.01 ± 0.02

@j178 j178 force-pushed the optimize-detect-private-key branch from a8ad45c to f475271 Compare March 15, 2026 09:01
@j178 j178 force-pushed the optimize-fix-end-of-file branch from 4c198bf to b071af4 Compare March 15, 2026 09:01
@j178 j178 merged commit 69e87bb into optimize-detect-private-key Mar 15, 2026
53 of 68 checks passed
@j178 j178 deleted the optimize-fix-end-of-file branch March 15, 2026 09:22
j178 added a commit that referenced this pull request Mar 15, 2026
Reduce file operations in fix_end_of_file.
j178 added a commit that referenced this pull request Mar 15, 2026
Reduce file operations in fix_end_of_file.
j178 added a commit that referenced this pull request Mar 15, 2026
j178 added a commit that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants