Optimize detect_private_key by chunked reading and using aho-corasick#1791
Optimize detect_private_key by chunked reading and using aho-corasick#1791
detect_private_key by chunked reading and using aho-corasick#1791Conversation
💡 Codex Reviewprek/crates/prek/src/cli/run/filter.rs Lines 137 to 140 in a8ad45c Caching ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
Optimizes private key detection (a built-in pre-commit hook) by switching from whole-file reads to chunked scanning with an Aho–Corasick matcher, and reduces repeated file-tag identification work during hook filtering.
Changes:
- Implement chunked streaming scan in
detect_private_keyusing a prebuiltaho_corasick::AhoCorasickmatcher with boundary carryover. - Cache
tags_from_pathresults per project inFileFilterto avoid recomputing tags during multiple filtering passes. - Add
aho-corasickas a workspace dependency and enable it for theprekcrate.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs | Stream-based scanning with Aho–Corasick + small carry buffer to catch boundary-split markers. |
| crates/prek/src/cli/run/filter.rs | Adds a per-project tag cache (file_tags) used by type/tag filtering. |
| crates/prek/Cargo.toml | Adds aho-corasick dependency for the prek crate. |
| Cargo.toml | Adds aho-corasick to workspace dependencies. |
| Cargo.lock | Records the new dependency in the prek package’s dependency list. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1791 +/- ##
=======================================
Coverage 91.72% 91.72%
=======================================
Files 98 98
Lines 20025 20037 +12
=======================================
+ Hits 18367 18379 +12
Misses 1658 1658 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
📦 Cargo Bloat ComparisonBinary size change: +0.00% (24.8 MiB → 24.8 MiB) Expand for cargo-bloat outputHead Branch ResultsBase Branch Results |
⚡️ Hyperfine BenchmarksSummary: 0 regressions, 0 improvements above the 10% threshold. Environment
CLI CommandsBenchmarking basic commands in the main repo:
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base --version |
2.4 ± 0.1 | 2.2 | 2.6 | 1.01 ± 0.08 |
prek-head --version |
2.4 ± 0.2 | 2.2 | 2.8 | 1.00 |
prek list
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base list |
9.1 ± 0.5 | 8.7 | 12.4 | 1.00 |
prek-head list |
9.3 ± 0.7 | 8.7 | 12.8 | 1.02 ± 0.09 |
prek validate-config .pre-commit-config.yaml
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base validate-config .pre-commit-config.yaml |
3.1 ± 0.1 | 3.0 | 3.2 | 1.01 ± 0.02 |
prek-head validate-config .pre-commit-config.yaml |
3.1 ± 0.0 | 3.0 | 3.2 | 1.00 |
prek sample-config
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base sample-config |
2.6 ± 0.0 | 2.6 | 2.7 | 1.00 |
prek-head sample-config |
2.8 ± 0.2 | 2.6 | 3.3 | 1.07 ± 0.08 |
Cold vs Warm Runs
Comparing first run (cold) vs subsequent runs (warm cache):
prek run --all-files (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
150.9 ± 4.1 | 147.3 | 161.8 | 1.00 |
prek-head run --all-files |
152.8 ± 2.1 | 149.2 | 155.3 | 1.01 ± 0.03 |
prek run --all-files (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
151.9 ± 4.2 | 147.7 | 161.7 | 1.00 |
prek-head run --all-files |
153.1 ± 5.0 | 147.4 | 165.7 | 1.01 ± 0.04 |
Full Hook Suite
Running the builtin hook suite on the benchmark workspace:
prek run --all-files (full builtin hook suite)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
151.7 ± 4.2 | 145.2 | 163.0 | 1.00 |
prek-head run --all-files |
157.2 ± 22.9 | 147.5 | 312.0 | 1.04 ± 0.15 |
Individual Hook Performance
Benchmarking each hook individually on the test repo:
prek run trailing-whitespace --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run trailing-whitespace --all-files |
22.4 ± 0.4 | 21.6 | 23.4 | 1.00 |
prek-head run trailing-whitespace --all-files |
22.5 ± 1.3 | 21.3 | 27.3 | 1.00 ± 0.06 |
prek run end-of-file-fixer --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run end-of-file-fixer --all-files |
28.9 ± 2.4 | 25.3 | 34.5 | 1.06 ± 0.10 |
prek-head run end-of-file-fixer --all-files |
27.2 ± 1.3 | 25.4 | 30.6 | 1.00 |
prek run check-json --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-json --all-files |
12.9 ± 0.5 | 12.1 | 13.8 | 1.01 ± 0.09 |
prek-head run check-json --all-files |
12.8 ± 1.0 | 11.8 | 15.6 | 1.00 |
prek run check-yaml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-yaml --all-files |
12.1 ± 0.2 | 11.7 | 12.8 | 1.00 |
prek-head run check-yaml --all-files |
12.6 ± 1.1 | 11.9 | 16.5 | 1.05 ± 0.09 |
prek run check-toml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-toml --all-files |
12.3 ± 0.2 | 11.9 | 12.6 | 1.00 |
prek-head run check-toml --all-files |
12.7 ± 1.1 | 11.7 | 15.9 | 1.04 ± 0.09 |
prek run check-xml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-xml --all-files |
12.4 ± 0.3 | 11.9 | 12.9 | 1.00 |
prek-head run check-xml --all-files |
12.5 ± 0.4 | 11.9 | 13.3 | 1.01 ± 0.04 |
prek run detect-private-key --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run detect-private-key --all-files |
16.1 ± 1.3 | 14.6 | 19.8 | 1.00 |
prek-head run detect-private-key --all-files |
18.4 ± 1.0 | 16.5 | 21.0 | 1.14 ± 0.11 |
Installation Performance
Benchmarking hook installation (fast path hooks skip Python setup):
prek install-hooks (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
4.7 ± 0.0 | 4.7 | 4.8 | 1.00 |
prek-head install-hooks |
4.7 ± 0.1 | 4.7 | 4.8 | 1.00 ± 0.01 |
prek install-hooks (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
4.7 ± 0.0 | 4.7 | 4.8 | 1.00 |
prek-head install-hooks |
4.9 ± 0.4 | 4.7 | 5.6 | 1.05 ± 0.08 |
File Filtering/Scoping Performance
Testing different file selection modes:
prek run (staged files only)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run |
18.6 ± 0.2 | 18.3 | 18.9 | 1.00 |
prek-head run |
18.7 ± 0.2 | 18.4 | 19.1 | 1.00 ± 0.01 |
prek run --files '*.json' (specific file type)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --files '*.json' |
7.5 ± 0.1 | 7.4 | 7.7 | 1.00 |
prek-head run --files '*.json' |
8.6 ± 1.1 | 7.4 | 11.3 | 1.15 ± 0.14 |
Workspace Discovery & Initialization
Benchmarking hook discovery and initialization overhead:
prek run --dry-run --all-files (measures init overhead)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --dry-run --all-files |
10.5 ± 0.1 | 10.3 | 11.0 | 1.00 ± 0.02 |
prek-head run --dry-run --all-files |
10.5 ± 0.1 | 10.3 | 10.7 | 1.00 |
Meta Hooks Performance
Benchmarking meta hooks separately:
prek run check-hooks-apply --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-hooks-apply --all-files |
14.3 ± 0.1 | 14.2 | 14.6 | 1.00 |
prek-head run check-hooks-apply --all-files |
15.1 ± 1.2 | 14.1 | 17.4 | 1.05 ± 0.08 |
prek run check-useless-excludes --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-useless-excludes --all-files |
13.6 ± 0.7 | 12.4 | 14.3 | 1.08 ± 0.06 |
prek-head run check-useless-excludes --all-files |
12.6 ± 0.2 | 12.4 | 13.0 | 1.00 |
prek run identity --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run identity --all-files |
11.4 ± 0.2 | 11.0 | 11.8 | 1.00 |
prek-head run identity --all-files |
11.5 ± 0.2 | 11.2 | 11.8 | 1.00 ± 0.03 |
a8ad45c to
f475271
Compare
Benchmark `detect-private-key` in hyperfine script
Reduce file operations in fix_end_of_file.
69e87bb to
f441983
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves performance of the detect-private-key builtin hook by avoiding full-file reads and using an Aho–Corasick matcher for multi-pattern scanning, and wires the hook into the benchmarking script.
Changes:
- Switch
detect-private-keyto chunked scanning with a small carry-over window +aho-corasickmatcher. - Add
aho-corasickas a workspace dependency and to theprekcrate. - Include
detect-private-keyin the hyperfine benchmark hook list; minor refactor infix_end_of_filefile scanning.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/hyperfine-run-benchmarks.sh | Adds detect-private-key to the list of individually benchmarked hooks. |
| crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs | Refactors EOF fixer to reuse file size and use a fixed buffer for scanning. |
| crates/prek/src/hooks/pre_commit_hooks/detect_private_key.rs | Implements chunked scanning with carry-over and Aho–Corasick for faster detection. |
| crates/prek/Cargo.toml | Adds aho-corasick dependency to the prek crate. |
| Cargo.toml | Adds aho-corasick to workspace dependencies. |
| Cargo.lock | Locks the new dependency. |
Comments suppressed due to low confidence (1)
crates/prek/src/hooks/pre_commit_hooks/fix_end_of_file.rs:94
find_last_non_endingseeks backwards usingSeekFrom::Current(-block_size), but after eachread_exactthe cursor ends up back at EOF, so the next iteration re-reads the same tail block again. If a file ends with >4KB of line endings, this can incorrectly conclude the whole file is line endings and truncate it to empty. Consider seeking to an absolute offset based ondata_len - read_len - block_size(or usingSeekFrom::End(-(read_len as i64 + block_size as i64))) so each loop scans progressively earlier blocks.
while read_len < data_len {
let block_size = MAX_SCAN_SIZE.min(usize::try_from(data_len - read_len)?);
// SAFETY: block_size is guaranteed to be less than or equal to MAX_SCAN_SIZE
reader
.seek(SeekFrom::Current(-i64::try_from(block_size).unwrap()))
.await?;
reader.read_exact(&mut buf[..block_size]).await?;
Optimize detect_private_key scanning.