Optimize `detect_private_key` by chunked reading and using aho-corasick by j178 · Pull Request #1791 · j178/prek

j178 · 2026-03-15T07:34:46Z

Optimize detect_private_key scanning.

chatgpt-codex-connector · 2026-03-15T07:40:24Z

💡 Codex Review

Lines 137 to 140 in a8ad45c

    
           let file_tags = filenames 
        
               .par_iter() 
        
               .filter_map(|filename| match tags_from_path(filename) { 
        
                   Ok(tags) => Some((*filename, tags)),

Recompute file tags after hooks mutate files

Caching tags_from_path once in for_project makes tag filtering stale for the rest of the project run: run_hooks builds a single FileFilter and reuses it across all priority groups, but earlier hooks can modify/delete files or change executable/shebang metadata. In those cases, later hooks will be filtered with outdated tags, so they can run on files that no longer exist/match (or skip files that now should match), which is a behavior regression from the previous per-hook tag lookup.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Optimizes private key detection (a built-in pre-commit hook) by switching from whole-file reads to chunked scanning with an Aho–Corasick matcher, and reduces repeated file-tag identification work during hook filtering.

Changes:

Implement chunked streaming scan in detect_private_key using a prebuilt aho_corasick::AhoCorasick matcher with boundary carryover.
Cache tags_from_path results per project in FileFilter to avoid recomputing tags during multiple filtering passes.
Add aho-corasick as a workspace dependency and enable it for the prek crate.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs	Stream-based scanning with Aho–Corasick + small carry buffer to catch boundary-split markers.
crates/prek/src/cli/run/filter.rs	Adds a per-project tag cache (`file_tags`) used by type/tag filtering.
crates/prek/Cargo.toml	Adds `aho-corasick` dependency for the `prek` crate.
Cargo.toml	Adds `aho-corasick` to workspace dependencies.
Cargo.lock	Records the new dependency in the `prek` package’s dependency list.

crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs

codecov · 2026-03-15T07:57:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.72%. Comparing base (d6f6e43) to head (f441983).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1791   +/-   ##
=======================================
  Coverage   91.72%   91.72%           
=======================================
  Files          98       98           
  Lines       20025    20037   +12     
=======================================
+ Hits        18367    18379   +12     
  Misses       1658     1658

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

prek-ci-bot · 2026-03-15T07:58:26Z

📦 Cargo Bloat Comparison

Binary size change: +0.00% (24.8 MiB → 24.8 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.6%  76.3KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.1KiB              prek prek::run::{{closure}}
 0.2%   0.3%  42.0KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  31.8KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.4KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.1%  86.0%  10.2MiB                   And 23274 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.8MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_38_0_aes_gcm_decrypt_avx512
 0.3%   0.6%  76.3KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.8KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  68.0KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  50.6KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  46.9KiB              prek prek::run::{{closure}}
 0.2%   0.3%  42.0KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_38_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.9KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.4KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  22.0KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.1%  86.0%  10.2MiB                   And 23301 smaller methods. Use -n N to show more.
47.8% 100.0%  11.8MiB                   .text section size, the file size is 24.8MiB

prek-ci-bot · 2026-03-15T07:58:27Z

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment

OS: Linux 6.14.0-1017-azure
CPU: 4 cores
prek version: prek 0.3.5+21 (c11e786 2026-03-15)
Rust version: rustc 1.94.0 (4a4ef493e 2026-03-02)
Hyperfine version: hyperfine 1.20.0

CLI Commands

Benchmarking basic commands in the main repo:

`prek --version`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base --version`	2.4 ± 0.1	2.2	2.6	1.01 ± 0.08
`prek-head --version`	2.4 ± 0.2	2.2	2.8	1.00

`prek list`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base list`	9.1 ± 0.5	8.7	12.4	1.00
`prek-head list`	9.3 ± 0.7	8.7	12.8	1.02 ± 0.09

`prek validate-config .pre-commit-config.yaml`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base validate-config .pre-commit-config.yaml`	3.1 ± 0.1	3.0	3.2	1.01 ± 0.02
`prek-head validate-config .pre-commit-config.yaml`	3.1 ± 0.0	3.0	3.2	1.00

`prek sample-config`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base sample-config`	2.6 ± 0.0	2.6	2.7	1.00
`prek-head sample-config`	2.8 ± 0.2	2.6	3.3	1.07 ± 0.08

Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

`prek run --all-files (cold - no cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	150.9 ± 4.1	147.3	161.8	1.00
`prek-head run --all-files`	152.8 ± 2.1	149.2	155.3	1.01 ± 0.03

`prek run --all-files (warm - with cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	151.9 ± 4.2	147.7	161.7	1.00
`prek-head run --all-files`	153.1 ± 5.0	147.4	165.7	1.01 ± 0.04

Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

`prek run --all-files (full builtin hook suite)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --all-files`	151.7 ± 4.2	145.2	163.0	1.00
`prek-head run --all-files`	157.2 ± 22.9	147.5	312.0	1.04 ± 0.15

Individual Hook Performance

Benchmarking each hook individually on the test repo:

`prek run trailing-whitespace --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run trailing-whitespace --all-files`	22.4 ± 0.4	21.6	23.4	1.00
`prek-head run trailing-whitespace --all-files`	22.5 ± 1.3	21.3	27.3	1.00 ± 0.06

`prek run end-of-file-fixer --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run end-of-file-fixer --all-files`	28.9 ± 2.4	25.3	34.5	1.06 ± 0.10
`prek-head run end-of-file-fixer --all-files`	27.2 ± 1.3	25.4	30.6	1.00

`prek run check-json --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-json --all-files`	12.9 ± 0.5	12.1	13.8	1.01 ± 0.09
`prek-head run check-json --all-files`	12.8 ± 1.0	11.8	15.6	1.00

`prek run check-yaml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-yaml --all-files`	12.1 ± 0.2	11.7	12.8	1.00
`prek-head run check-yaml --all-files`	12.6 ± 1.1	11.9	16.5	1.05 ± 0.09

`prek run check-toml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-toml --all-files`	12.3 ± 0.2	11.9	12.6	1.00
`prek-head run check-toml --all-files`	12.7 ± 1.1	11.7	15.9	1.04 ± 0.09

`prek run check-xml --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-xml --all-files`	12.4 ± 0.3	11.9	12.9	1.00
`prek-head run check-xml --all-files`	12.5 ± 0.4	11.9	13.3	1.01 ± 0.04

`prek run detect-private-key --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run detect-private-key --all-files`	16.1 ± 1.3	14.6	19.8	1.00
`prek-head run detect-private-key --all-files`	18.4 ± 1.0	16.5	21.0	1.14 ± 0.11

Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

`prek install-hooks (cold - no cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base install-hooks`	4.7 ± 0.0	4.7	4.8	1.00
`prek-head install-hooks`	4.7 ± 0.1	4.7	4.8	1.00 ± 0.01

`prek install-hooks (warm - with cache)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base install-hooks`	4.7 ± 0.0	4.7	4.8	1.00
`prek-head install-hooks`	4.9 ± 0.4	4.7	5.6	1.05 ± 0.08

File Filtering/Scoping Performance

Testing different file selection modes:

`prek run (staged files only)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run`	18.6 ± 0.2	18.3	18.9	1.00
`prek-head run`	18.7 ± 0.2	18.4	19.1	1.00 ± 0.01

`prek run --files '*.json' (specific file type)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --files '*.json'`	7.5 ± 0.1	7.4	7.7	1.00
`prek-head run --files '*.json'`	8.6 ± 1.1	7.4	11.3	1.15 ± 0.14

Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

`prek run --dry-run --all-files (measures init overhead)`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run --dry-run --all-files`	10.5 ± 0.1	10.3	11.0	1.00 ± 0.02
`prek-head run --dry-run --all-files`	10.5 ± 0.1	10.3	10.7	1.00

Meta Hooks Performance

Benchmarking meta hooks separately:

`prek run check-hooks-apply --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-hooks-apply --all-files`	14.3 ± 0.1	14.2	14.6	1.00
`prek-head run check-hooks-apply --all-files`	15.1 ± 1.2	14.1	17.4	1.05 ± 0.08

`prek run check-useless-excludes --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run check-useless-excludes --all-files`	13.6 ± 0.7	12.4	14.3	1.08 ± 0.06
`prek-head run check-useless-excludes --all-files`	12.6 ± 0.2	12.4	13.0	1.00

`prek run identity --all-files`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`prek-base run identity --all-files`	11.4 ± 0.2	11.0	11.8	1.00
`prek-head run identity --all-files`	11.5 ± 0.2	11.2	11.8	1.00 ± 0.03

Benchmark `detect-private-key` in hyperfine script

Reduce file operations in fix_end_of_file.

Copilot

Pull request overview

This PR improves performance of the detect-private-key builtin hook by avoiding full-file reads and using an Aho–Corasick matcher for multi-pattern scanning, and wires the hook into the benchmarking script.

Changes:

Switch detect-private-key to chunked scanning with a small carry-over window + aho-corasick matcher.
Add aho-corasick as a workspace dependency and to the prek crate.
Include detect-private-key in the hyperfine benchmark hook list; minor refactor in fix_end_of_file file scanning.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/hyperfine-run-benchmarks.sh	Adds `detect-private-key` to the list of individually benchmarked hooks.
crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs	Refactors EOF fixer to reuse file size and use a fixed buffer for scanning.
crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs	Implements chunked scanning with carry-over and Aho–Corasick for faster detection.
crates/prek/Cargo.toml	Adds `aho-corasick` dependency to the `prek` crate.
Cargo.toml	Adds `aho-corasick` to workspace dependencies.
Cargo.lock	Locks the new dependency.

Comments suppressed due to low confidence (1)

crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs:94

find_last_non_ending seeks backwards using SeekFrom::Current(-block_size), but after each read_exact the cursor ends up back at EOF, so the next iteration re-reads the same tail block again. If a file ends with >4KB of line endings, this can incorrectly conclude the whole file is line endings and truncate it to empty. Consider seeking to an absolute offset based on data_len - read_len - block_size (or using SeekFrom::End(-(read_len as i64 + block_size as i64))) so each loop scans progressively earlier blocks.

    while read_len < data_len {
        let block_size = MAX_SCAN_SIZE.min(usize::try_from(data_len - read_len)?);
        // SAFETY: block_size is guaranteed to be less than or equal to MAX_SCAN_SIZE
        reader
            .seek(SeekFrom::Current(-i64::try_from(block_size).unwrap()))
            .await?;
        reader.read_exact(&mut buf[..block_size]).await?;

crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs

Copilot AI review requested due to automatic review settings March 15, 2026 07:34

j178 added the performance Performance improvements label Mar 15, 2026

Copilot started reviewing on behalf of j178 March 15, 2026 07:36 View session

j178 changed the base branch from master to cache-file-tags-filtering March 15, 2026 07:36

Copilot AI reviewed Mar 15, 2026

View reviewed changes

crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs Show resolved Hide resolved

j178 force-pushed the optimize-detect-private-key branch from a8ad45c to f475271 Compare March 15, 2026 09:01

j178 changed the base branch from cache-file-tags-filtering to master March 15, 2026 09:24

j178 added 2 commits March 15, 2026 17:26

Optimize detect_private_key by chunked reading and using aho-corasick

4ea26ba

Benchmark `detect-private-key` in hyperfine script

Optimize fix_end_of_file by reducing file operations (#1792)

f441983

Reduce file operations in fix_end_of_file.

Copilot AI review requested due to automatic review settings March 15, 2026 09:27

j178 force-pushed the optimize-detect-private-key branch from 69e87bb to f441983 Compare March 15, 2026 09:27

j178 merged commit da182c8 into master Mar 15, 2026
50 checks passed

j178 deleted the optimize-detect-private-key branch March 15, 2026 09:27

Copilot started reviewing on behalf of j178 March 15, 2026 09:29 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs Show resolved Hide resolved

crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs Show resolved Hide resolved

BrewTestBot mentioned this pull request Mar 16, 2026

prek 0.3.6 Homebrew/homebrew-core#272532

Merged

Conversation

j178 commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 15, 2026

💡 Codex Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

codecov bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

prek-ci-bot bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Cargo Bloat Comparison

Head Branch Results

Base Branch Results

Uh oh!

prek-ci-bot bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡️ Hyperfine Benchmarks

prek --version

prek list

prek validate-config .pre-commit-config.yaml

prek sample-config

prek run --all-files (cold - no cache)

prek run --all-files (warm - with cache)

prek run --all-files (full builtin hook suite)

prek run trailing-whitespace --all-files

prek run end-of-file-fixer --all-files

prek run check-json --all-files

prek run check-yaml --all-files

prek run check-toml --all-files

prek run check-xml --all-files

prek run detect-private-key --all-files

prek install-hooks (cold - no cache)

prek install-hooks (warm - with cache)

prek run (staged files only)

prek run --files '*.json' (specific file type)

prek run --dry-run --all-files (measures init overhead)

prek run check-hooks-apply --all-files

prek run check-useless-excludes --all-files

prek run identity --all-files

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 15, 2026 •

edited

Loading

prek-ci-bot bot commented Mar 15, 2026 •

edited

Loading

prek-ci-bot bot commented Mar 15, 2026 •

edited

Loading

`prek --version`

`prek list`

`prek validate-config .pre-commit-config.yaml`

`prek sample-config`

`prek run --all-files (cold - no cache)`

`prek run --all-files (warm - with cache)`

`prek run --all-files (full builtin hook suite)`

`prek run trailing-whitespace --all-files`

`prek run end-of-file-fixer --all-files`

`prek run check-json --all-files`

`prek run check-yaml --all-files`

`prek run check-toml --all-files`

`prek run check-xml --all-files`

`prek run detect-private-key --all-files`

`prek install-hooks (cold - no cache)`

`prek install-hooks (warm - with cache)`

`prek run (staged files only)`

`prek run --files '*.json' (specific file type)`

`prek run --dry-run --all-files (measures init overhead)`

`prek run check-hooks-apply --all-files`

`prek run check-useless-excludes --all-files`

`prek run identity --all-files`