perf: use Aho-Corasick for efficient redaction#7931
Conversation
Replace the naive O(n*m) redaction implementation with Aho-Corasick automaton that finds all patterns in a single pass - O(n + z) where n is input length and z is number of matches. Changes: - Add Redactor struct with cached Aho-Corasick automaton - Update Config to use Redactor instead of IndexSet<String> - Update CmdLineRunner to use Redactor - Fix logger to redact once instead of separately for file/terminal - Add aho-corasick as direct dependency (was already transitive via regex) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Pull request overview
This PR optimizes string redaction by replacing the naive O(n*m) pattern-matching approach with the Aho-Corasick algorithm, achieving O(n + z) complexity. The change introduces a Redactor struct that caches the compiled automaton and refactors the logger to perform redaction once instead of separately for each output destination.
Changes:
- Introduced
Redactorstruct with Aho-Corasick automaton for efficient multi-pattern matching - Refactored
Configto useRedactorinstead ofIndexSet<String>for redactions - Updated logger to redact once before rendering for both file and terminal outputs
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Added aho-corasick dependency |
| src/redactions.rs | Added Redactor struct with Aho-Corasick-based redaction and comprehensive tests |
| src/config/mod.rs | Refactored to use Redactor instead of IndexSet<String>, updated redaction methods |
| src/cmd.rs | Updated CmdLineRunner to use Redactor for output redaction |
| src/logger.rs | Refactored to redact once at the top of log() before rendering |
| src/task/task_executor.rs | Updated redact calls to pass string references |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/config/mod.rs
Outdated
| input = input.replace(redaction, "[redacted]"); | ||
| } | ||
| input | ||
| Arc::new(_REDACTOR.lock().unwrap().patterns().clone()) |
There was a problem hiding this comment.
This method creates a new Arc wrapping a clone of the patterns on every call. Since patterns() already returns a reference to an Arc<IndexSet<String>> internally, consider returning Arc::clone(&self.patterns) from within the Redactor to avoid the double allocation and clone.
| Arc::new(_REDACTOR.lock().unwrap().patterns().clone()) | |
| Arc::clone(_REDACTOR.lock().unwrap().patterns()) |
| let args = if config::is_loaded() { | ||
| Config::get_().redact(&args) | ||
| } else { | ||
| args | ||
| }; |
There was a problem hiding this comment.
The redaction is performed even when both will_log_file and will_log_term would never use the result. Consider moving this redaction inside the conditional blocks after checking will_log_file || will_log_term to avoid unnecessary work when logging is disabled for the current level.
src/redactions.rs
Outdated
| pub fn redact_string(&self, input: String) -> String { | ||
| if self.automaton.is_none() { | ||
| return input; | ||
| } | ||
| self.redact(&input) | ||
| } |
There was a problem hiding this comment.
The redact_string method doesn't actually redact in place as its documentation suggests. It converts to &str, calls redact() which allocates a new String, and discards the original. This provides no performance benefit over redact(). Consider either implementing true in-place redaction using AhoCorasick::try_stream_replace_all or removing this method.
|
bugbot run |
Hyperfine Performance
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.1.12 x -- echo |
20.3 ± 0.6 | 18.9 | 22.7 | 1.00 |
mise x -- echo |
20.8 ± 0.8 | 19.3 | 23.2 | 1.03 ± 0.05 |
mise env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.1.12 env |
19.8 ± 0.8 | 18.5 | 25.8 | 1.00 |
mise env |
20.1 ± 0.7 | 18.6 | 22.2 | 1.02 ± 0.05 |
mise hook-env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.1.12 hook-env |
20.4 ± 0.5 | 19.2 | 21.8 | 1.00 |
mise hook-env |
20.7 ± 0.8 | 19.3 | 23.4 | 1.02 ± 0.05 |
mise ls
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.1.12 ls |
18.1 ± 0.5 | 17.0 | 22.1 | 1.00 |
mise ls |
18.6 ± 0.7 | 17.2 | 20.4 | 1.02 ± 0.05 |
xtasks/test/perf
| Command | mise-2026.1.12 | mise | Variance |
|---|---|---|---|
| install (cached) | 111ms | 111ms | +0% |
| ls (cached) | 70ms | 70ms | +0% |
| bin-paths (cached) | 74ms | 74ms | +0% |
| task-ls (cached) | 534ms | 532ms | +0% |
- Remove unused `redactor()` method from Config - Remove unused `redact_string()` and `is_empty()` from Redactor - Add `patterns_arc()` to avoid double allocation in `redactions()` - Add fallback to naive approach if automaton fails to build Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
### 🚀 Features - **(edit)** add interactive config editor (`mise edit`) by @jdx in [#7930](#7930) - **(lockfile)** graduate lockfiles from experimental by @jdx in [#7929](#7929) - **(task)** add support for usage values in task confirm dialog by @roele in [#7924](#7924) - **(task)** improve source freshness checking with edge case handling by @jdx in [#7932](#7932) ### 🐛 Bug Fixes - **(activate)** preserve ordering of paths appended after mise activate by @jdx in [#7919](#7919) - **(install)** sort failed installations for deterministic error output by @jdx in [#7936](#7936) - **(lockfile)** preserve URL and prefer sha256 when merging platform info by @jdx in [#7923](#7923) - **(lockfile)** add atomic writes and cache invalidation by @jdx in [#7927](#7927) - **(templates)** use sha256 for hash filter instead of blake3 by @jdx in [#7925](#7925) - **(upgrade)** respect tracked configs when pruning old versions by @jdx in [#7926](#7926) ### 🚜 Refactor - **(progress)** migrate from indicatif to clx by @jdx in [#7928](#7928) ### 📚 Documentation - improve clarity on uvx and pipx dependencies by @ygormutti in [#7878](#7878) ### ⚡ Performance - **(install)** use Kahn's algorithm for dependency scheduling by @jdx in [#7933](#7933) - use Aho-Corasick for efficient redaction by @jdx in [#7931](#7931) ### 🧪 Testing - remove flaky test_http_version_list test by @jdx in [#7934](#7934) ### Chore - use github backend instead of ubi in mise.lock by @jdx in [#7922](#7922) ### New Contributors - @ygormutti made their first contribution in [#7878](#7878)
## Summary - Replace naive O(n*m) redaction with Aho-Corasick automaton that finds all patterns in single pass - O(n + z) - Add `Redactor` struct that caches the compiled automaton - Fix logger to redact once instead of separately for file and terminal output (was doing double work when both enabled) ## Changes - Add `aho-corasick` as direct dependency (was already transitive via regex) - Add `Redactor` struct to `redactions.rs` with cached Aho-Corasick automaton - Update `Config` to use `Redactor` instead of `IndexSet<String>` - Update `CmdLineRunner` to use `Redactor` for output redaction - Refactor logger to redact once at the top of `log()` before rendering ## Test plan - [x] Unit tests for `Redactor` pass - [x] E2E redaction tests pass (`e2e/tasks/test_task_redactions`, `e2e/cli/test_env_redacted_flags`) - [x] All lints pass 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Moderate risk because it changes the core redaction implementation used for logs and command/task output; bugs could lead to missed or incorrect redactions despite being mostly a performance refactor. > > **Overview** > Switches string redaction from repeated `str::replace` loops to a cached Aho-Corasick automaton via a new `Redactor` type (with unit tests), and adds `aho-corasick` as a direct dependency. > > Wires the new redactor through `Config` (replacing the global `IndexSet` storage), `CmdLineRunner` output handling, and task command rendering, and refactors `Logger::log` to redact once and reuse the sanitized message for both terminal and file outputs. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 9287026. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
### 🚀 Features - **(edit)** add interactive config editor (`mise edit`) by @jdx in [jdx#7930](jdx#7930) - **(lockfile)** graduate lockfiles from experimental by @jdx in [jdx#7929](jdx#7929) - **(task)** add support for usage values in task confirm dialog by @roele in [jdx#7924](jdx#7924) - **(task)** improve source freshness checking with edge case handling by @jdx in [jdx#7932](jdx#7932) ### 🐛 Bug Fixes - **(activate)** preserve ordering of paths appended after mise activate by @jdx in [jdx#7919](jdx#7919) - **(install)** sort failed installations for deterministic error output by @jdx in [jdx#7936](jdx#7936) - **(lockfile)** preserve URL and prefer sha256 when merging platform info by @jdx in [jdx#7923](jdx#7923) - **(lockfile)** add atomic writes and cache invalidation by @jdx in [jdx#7927](jdx#7927) - **(templates)** use sha256 for hash filter instead of blake3 by @jdx in [jdx#7925](jdx#7925) - **(upgrade)** respect tracked configs when pruning old versions by @jdx in [jdx#7926](jdx#7926) ### 🚜 Refactor - **(progress)** migrate from indicatif to clx by @jdx in [jdx#7928](jdx#7928) ### 📚 Documentation - improve clarity on uvx and pipx dependencies by @ygormutti in [jdx#7878](jdx#7878) ### ⚡ Performance - **(install)** use Kahn's algorithm for dependency scheduling by @jdx in [jdx#7933](jdx#7933) - use Aho-Corasick for efficient redaction by @jdx in [jdx#7931](jdx#7931) ### 🧪 Testing - remove flaky test_http_version_list test by @jdx in [jdx#7934](jdx#7934) ### Chore - use github backend instead of ubi in mise.lock by @jdx in [jdx#7922](jdx#7922) ### New Contributors - @ygormutti made their first contribution in [jdx#7878](jdx#7878)
Summary
Redactorstruct that caches the compiled automatonChanges
aho-corasickas direct dependency (was already transitive via regex)Redactorstruct toredactions.rswith cached Aho-Corasick automatonConfigto useRedactorinstead ofIndexSet<String>CmdLineRunnerto useRedactorfor output redactionlog()before renderingTest plan
Redactorpasse2e/tasks/test_task_redactions,e2e/cli/test_env_redacted_flags)🤖 Generated with Claude Code
Note
Medium Risk
Moderate risk because it changes the core redaction implementation used for logs and command/task output; bugs could lead to missed or incorrect redactions despite being mostly a performance refactor.
Overview
Switches string redaction from repeated
str::replaceloops to a cached Aho-Corasick automaton via a newRedactortype (with unit tests), and addsaho-corasickas a direct dependency.Wires the new redactor through
Config(replacing the globalIndexSetstorage),CmdLineRunneroutput handling, and task command rendering, and refactorsLogger::logto redact once and reuse the sanitized message for both terminal and file outputs.Written by Cursor Bugbot for commit 9287026. This will update automatically on new commits. Configure here.